sched/uclamp: Ignore max aggregation if rq is idle
authorXuewen Yan <xuewen.yan@unisoc.com>
Wed, 30 Jun 2021 14:12:04 +0000 (22:12 +0800)
committerPeter Zijlstra <peterz@infradead.org>
Fri, 2 Jul 2021 13:58:24 +0000 (15:58 +0200)
When a task wakes up on an idle rq, uclamp_rq_util_with() would max
aggregate with rq value. But since there is no task enqueued yet, the
values are stale based on the last task that was running. When the new
task actually wakes up and enqueued, then the rq uclamp values should
reflect that of the newly woken up task effective uclamp values.

This is a problem particularly for uclamp_max because it default to
1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
would ignore the capping that should apply when this task is enqueued,
which is wrong.

Fix that by ignoring max aggregation if the rq is idle since in that
case the effective uclamp value of the rq will be the ones of the task
that will wake up.

Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
[qias: Changelog]
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210630141204.8197-1-xuewen.yan94@gmail.com
kernel/sched/sched.h

index c80d42e9589bf287c43a2cac8a70c5dee9abaffa..14a41a243f7baf308cfbda57c3d3a8c6b3e0d9a3 100644 (file)
@@ -2818,20 +2818,27 @@ static __always_inline
 unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
                                  struct task_struct *p)
 {
-       unsigned long min_util;
-       unsigned long max_util;
+       unsigned long min_util = 0;
+       unsigned long max_util = 0;
 
        if (!static_branch_likely(&sched_uclamp_used))
                return util;
 
-       min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
-       max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
-
        if (p) {
-               min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
-               max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
+               min_util = uclamp_eff_value(p, UCLAMP_MIN);
+               max_util = uclamp_eff_value(p, UCLAMP_MAX);
+
+               /*
+                * Ignore last runnable task's max clamp, as this task will
+                * reset it. Similarly, no need to read the rq's min clamp.
+                */
+               if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
+                       goto out;
        }
 
+       min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
+       max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
+out:
        /*
         * Since CPU's {min,max}_util clamps are MAX aggregated considering
         * RUNNABLE tasks with _different_ clamps, we can end up with an