NUMA: early use of cpu_to_node() returns 0 instead of the correct node id
authorHuang Shijie <shijie@os.amperecomputing.com>
Fri, 26 Jan 2024 06:44:51 +0000 (14:44 +0800)
committerAndrew Morton <akpm@linux-foundation.org>
Fri, 26 Apr 2024 04:07:03 +0000 (21:07 -0700)
During the kernel booting, the generic cpu_to_node() is called too early
in arm64, powerpc and riscv when CONFIG_NUMA is enabled.

There are at least four places in the common code where
the generic cpu_to_node() is called before it is initialized:
   1.) early_trace_init()         in kernel/trace/trace.c
   2.) sched_init()               in kernel/sched/core.c
   3.) init_sched_fair_class()    in kernel/sched/fair.c
   4.) workqueue_init_early()     in kernel/workqueue.c

This will harm performance since there is an increase in off node
accesses.

In order to fix the bug, the patch introduces early_numa_node_init() which
is called after smp_prepare_boot_cpu() in start_kernel.
early_numa_node_init will initialize the "numa_node" as soon as the
early_cpu_to_node() is ready, before the cpu_to_node() is called at the
first time.

Link: https://lkml.kernel.org/r/20240126064451.5465-1-shijie@os.amperecomputing.com
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [RISC-V]
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Kelley (LINUX) <mikelley@microsoft.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
init/main.c

index 6b4b656cef1d9f90cef99d86a9189c528ba64703..0aecd2839c1fd5884e5fa5ff80dc49cc93ce2d18 100644 (file)
@@ -882,6 +882,19 @@ static void __init print_unknown_bootoptions(void)
        memblock_free(unknown_options, len);
 }
 
+static void __init early_numa_node_init(void)
+{
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+#ifndef cpu_to_node
+       int cpu;
+
+       /* The early_cpu_to_node() should be ready here. */
+       for_each_possible_cpu(cpu)
+               set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
+#endif
+#endif
+}
+
 asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
 void start_kernel(void)
 {
@@ -912,6 +925,7 @@ void start_kernel(void)
        setup_nr_cpu_ids();
        setup_per_cpu_areas();
        smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
+       early_numa_node_init();
        boot_cpu_hotplug_init();
 
        pr_notice("Kernel command line: %s\n", saved_command_line);