x86/percpu: Rewrite arch_raw_cpu_ptr() to be easier for compilers to optimize
authorUros Bizjak <ubizjak@gmail.com>
Sun, 15 Oct 2023 20:24:39 +0000 (22:24 +0200)
committerIngo Molnar <mingo@kernel.org>
Mon, 16 Oct 2023 10:51:58 +0000 (12:51 +0200)
Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
add the ptr value to the base. This way, the compiler can propagate
addend to the following instruction and simplify address calculation.

E.g.: address calcuation in amd_pmu_enable_virt() improves from:

    48 c7 c0 00 00 00 00 mov    $0x0,%rax
87b7: R_X86_64_32S cpu_hw_events

    65 48 03 05 00 00 00 add    %gs:0x0(%rip),%rax
    00
87bf: R_X86_64_PC32 this_cpu_off-0x4

    48 c7 80 28 13 00 00 movq   $0x0,0x1328(%rax)
    00 00 00 00

to:

    65 48 8b 05 00 00 00 mov    %gs:0x0(%rip),%rax
    00
8798: R_X86_64_PC32 this_cpu_off-0x4
    48 c7 80 00 00 00 00 movq   $0x0,0x0(%rax)
    00 00 00 00
87a6: R_X86_64_32S cpu_hw_events+0x1328

The compiler also eliminates additional redundant loads from
this_cpu_off, reducing the number of percpu offset reads
from 1668 to 1646 on a test build, a -1.3% reduction.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20231015202523.189168-1-ubizjak@gmail.com
arch/x86/include/asm/percpu.h

index 60ea7755c0fea6bc1264c380696d274ec451125e..915675f3ad6055c75e63f02e205b5f2d45ce336b 100644 (file)
 #define arch_raw_cpu_ptr(ptr)                                  \
 ({                                                             \
        unsigned long tcp_ptr__;                                \
-       asm ("add " __percpu_arg(1) ", %0"                      \
+       asm ("mov " __percpu_arg(1) ", %0"                      \
             : "=r" (tcp_ptr__)                                 \
-            : "m" (__my_cpu_var(this_cpu_off)), "0" (ptr));    \
+            : "m" (__my_cpu_var(this_cpu_off)));               \
+                                                               \
+       tcp_ptr__ += (unsigned long)(ptr);                      \
        (typeof(*(ptr)) __kernel __force *)tcp_ptr__;           \
 })
 #else /* CONFIG_SMP */