torture: Fix hang during kthread shutdown phase
authorJoel Fernandes (Google) <joel@joelfernandes.org>
Sun, 1 Jan 2023 06:15:55 +0000 (06:15 +0000)
committerPaul E. McKenney <paulmck@kernel.org>
Thu, 5 Jan 2023 20:10:35 +0000 (12:10 -0800)
During rcutorture shutdown, the rcu_torture_cleanup() function calls
torture_cleanup_begin(), which sets the fullstop global variable to
FULLSTOP_RMMOD. This causes the rcutorture threads for readers and
fakewriters to exit all of their "while" loops and start shutting down.

They then call torture_kthread_stopping(), which in turn waits for
kthread_stop() to be called.  However, rcu_torture_cleanup() has
not yet called kthread_stop() on those threads, and before it gets a
chance to do so, multiple instances of torture_kthread_stopping() invoke
schedule_timeout_interruptible(1) in a tight loop.  Tracing confirms that
TIMER_SOFTIRQ can then continuously execute timer callbacks.  If that
TIMER_SOFTIRQ preempts the task executing rcu_torture_cleanup(), that
task might never invoke kthread_stop().

This commit improves this situation by increasing the timeout passed to
schedule_timeout_interruptible() from one jiffy to 1/20th of a second.
This change prevents TIMER_SOFTIRQ from monopolizing its CPU, thus
allowing rcu_torture_cleanup() to carry out the needed kthread_stop()
invocations.  Testing has shown 100 runs of TREE07 passing reliably,
as oppose to the tens-of-percent failure rates seen beforehand.

Cc: Paul McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
kernel/torture.c

index 29afc62f2bfec8088aab70e62e7538809fca9469..1a0519b836ac9cacfa1dbed28f41b863dc5d5555 100644 (file)
@@ -915,7 +915,7 @@ void torture_kthread_stopping(char *title)
        VERBOSE_TOROUT_STRING(buf);
        while (!kthread_should_stop()) {
                torture_shutdown_absorb(title);
-               schedule_timeout_uninterruptible(1);
+               schedule_timeout_uninterruptible(HZ / 20);
        }
 }
 EXPORT_SYMBOL_GPL(torture_kthread_stopping);