RAS/CEC: Reduce offline page threshold for Intel systems
authorTony Luck <tony.luck@intel.com>
Tue, 2 Aug 2022 16:18:47 +0000 (09:18 -0700)
committerBorislav Petkov <bp@suse.de>
Mon, 22 Aug 2022 17:30:02 +0000 (19:30 +0200)
commitd25c6948a6aad787d9fd64de6b5362c3f23cc8d0
treec112d18e3a821c190445e977368056542c75c66a
parent1c23f9e627a7b412978b4e852793c5e3c3efc555
RAS/CEC: Reduce offline page threshold for Intel systems

A large scale study of memory errors on Intel systems in data centers
showed that aggressively taking pages with corrected errors offline is
the best strategy of using corrected errors as a predictor of future
uncorrected errors.

Set the threshold to "2" on Intel systems. AMD guidance is that this is
not necessary for their systems.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20220607212015.175591-1-tony.luck@intel.com
Link: https://lore.kernel.org/r/YulOZ/Eso0bwUcC4@agluck-desk3.sc.intel.com
drivers/ras/cec.c