rcu-tasks: Don't remove tasks with pending IPIs from holdout list
authorPaul E. McKenney <paulmck@kernel.org>
Sun, 19 Sep 2021 03:40:48 +0000 (20:40 -0700)
committerPaul E. McKenney <paulmck@kernel.org>
Wed, 1 Dec 2021 01:29:06 +0000 (17:29 -0800)
Currently, the check_all_holdout_tasks_trace() function removes all tasks
marked with ->trc_reader_checked from the holdout list, including those
with IPIs pending.  This means that the IPI handler might arrive at
a task that has already been removed from the list, which is at best
an accident waiting to happen.

This commit therefore avoids removing tasks with IPIs pending from
the holdout list.  This in turn means that the "if" condition in the
for_each_online_cpu() loop in rcu_tasks_trace_postgp() should always
evaluate to false, so a WARN_ON_ONCE() is added to check that.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
kernel/rcu/tasks.h

index 7da3c81c3f59c2ff3a64a215ef0713a87cb626af..bd44cd4794d3d71be23f6573a7886afe162907b5 100644 (file)
@@ -1121,7 +1121,8 @@ static void check_all_holdout_tasks_trace(struct list_head *hop,
                        trc_wait_for_one_reader(t, hop);
 
                // If check succeeded, remove this task from the list.
-               if (READ_ONCE(t->trc_reader_checked))
+               if (smp_load_acquire(&t->trc_ipi_to_cpu) == -1 &&
+                   READ_ONCE(t->trc_reader_checked))
                        trc_del_holdout(t);
                else if (needreport)
                        show_stalled_task_trace(t, firstreport);
@@ -1156,7 +1157,7 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
        // Yes, this assumes that CPUs process IPIs in order.  If that ever
        // changes, there will need to be a recheck and/or timed wait.
        for_each_online_cpu(cpu)
-               if (smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu)))
+               if (WARN_ON_ONCE(smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu))))
                        smp_call_function_single(cpu, rcu_tasks_trace_empty_fn, NULL, 1);
 
        // Remove the safety count.