powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
authorNicholas Piggin <npiggin@gmail.com>
Wed, 10 Nov 2021 02:50:53 +0000 (12:50 +1000)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 27 Jan 2022 10:04:57 +0000 (11:04 +0100)
commit52ce10c1878cf4583306fa2a2880d6f482c35cc7
treea48d8f9fbaea4e8264dd818ef285b1b2b3960d44
parentaa8c270145290f72d1d2caf5aa5d4b9fa6be4a11
powerpc/watchdog: Fix missed watchdog reset due to memory ordering race

[ Upstream commit 5dad4ba68a2483fc80d70b9dc90bbe16e1f27263 ]

It is possible for all CPUs to miss the pending cpumask becoming clear,
and then nobody resetting it, which will cause the lockup detector to
stop working. It will eventually expire, but watchdog_smp_panic will
avoid doing anything if the pending mask is clear and it will never be
reset.

Order the cpumask clear vs the subsequent test to close this race.

Add an extra check for an empty pending mask when the watchdog fires and
finds its bit still clear, to try to catch any other possible races or
bugs here and keep the watchdog working. The extra test in
arch_touch_nmi_watchdog is required to prevent the new warning from
firing off.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Debugged-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-2-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
arch/powerpc/kernel/watchdog.c