locking/atomic/x86: Introduce arch_try_cmpxchg64_local()
Introduce arch_try_cmpxchg64_local() for 64-bit and 32-bit targets
to improve code using cmpxchg64_local(). On 64-bit targets, the
generated assembly improves from:
3e28: 31 c0 xor %eax,%eax
3e2a: 4d 0f b1 7d 00 cmpxchg %r15,0x0(%r13)
3e2f: 48 85 c0 test %rax,%rax
3e32: 0f 85 9f 00 00 00 jne 3ed7 <...>
to:
3e28: 31 c0 xor %eax,%eax
3e2a: 4d 0f b1 7d 00 cmpxchg %r15,0x0(%r13)
3e2f: 0f 85 9f 00 00 00 jne 3ed4 <...>
where a TEST instruction after CMPXCHG is saved. The improvements
for 32-bit targets are even more noticeable, because double-word
compare after CMPXCHG8B gets eliminated.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20240414161257.49145-1-ubizjak@gmail.com