x86/percpu: Use C for arch_raw_cpu_ptr(), to improve code generation
authorUros Bizjak <ubizjak@gmail.com>
Sun, 15 Oct 2023 20:24:40 +0000 (22:24 +0200)
committerIngo Molnar <mingo@kernel.org>
Mon, 16 Oct 2023 10:52:02 +0000 (12:52 +0200)
commit1d10f3aec2bb734b4b594afe8c1bd0aa656a7e4d
tree92617f3981085cc3dd3a93c8b9f66904cc2ae80f
parenta048d3abae7c33f0a3f4575fab15ac5504d443f7
x86/percpu: Use C for arch_raw_cpu_ptr(), to improve code generation

Implement arch_raw_cpu_ptr() in C to allow the compiler to perform
better optimizations, such as setting an appropriate base to compute
the address. The compiler is free to choose either MOV or ADD from
this_cpu_off address to construct the optimal final address.

There are some other issues when memory access to the percpu area is
implemented with an asm. Compilers can not eliminate asm common
subexpressions over basic block boundaries, but are extremely good
at optimizing memory access. By implementing arch_raw_cpu_ptr() in C,
the compiler can eliminate additional redundant loads from this_cpu_off,
further reducing the number of percpu offset reads from 1646 to 1631
on a test build, a -0.9% reduction.

Co-developed-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20231015202523.189168-2-ubizjak@gmail.com
arch/x86/include/asm/percpu.h