locking/qspinlock/x86: Micro-optimize virt_spin_lock()
authorUros Bizjak <ubizjak@gmail.com>
Mon, 22 Apr 2024 12:00:38 +0000 (14:00 +0200)
committerIngo Molnar <mingo@kernel.org>
Wed, 24 Apr 2024 09:46:28 +0000 (11:46 +0200)
commit94af3a04e3f386d4f060d903826e85aa006ce252
tree5d1b4e482490d0f6591bb6a90630f69faf65a95a
parent33eb8ab4ec83cf0975d0113966c7e71cd6be60b2
locking/qspinlock/x86: Micro-optimize virt_spin_lock()

Optimize virt_spin_lock() to use simpler and faster:

  atomic_try_cmpxchg(*ptr, &val, new)

instead of:

  atomic_cmpxchg(*ptr, val, new) == val

The x86 CMPXCHG instruction returns success in the ZF flag, so
this change saves a compare after the CMPXCHG.

Also optimize retry loop a bit. atomic_try_cmpxchg() fails iff
&lock->val != 0, so there is no need to load and compare the
lock value again - cpu_relax() can be unconditinally called in
this case. This allows us to generate optimized:

  1f: ba 01 00 00 00        mov    $0x1,%edx
  24: 8b 03                 mov    (%rbx),%eax
  26: 85 c0                 test   %eax,%eax
  28: 75 63                 jne    8d <...>
  2a: f0 0f b1 13           lock cmpxchg %edx,(%rbx)
  2e: 75 5d                 jne    8d <...>
...
  8d: f3 90                 pause
  8f: eb 93                 jmp    24 <...>

instead of:

  1f: ba 01 00 00 00        mov    $0x1,%edx
  24: 8b 03                 mov    (%rbx),%eax
  26: 85 c0                 test   %eax,%eax
  28: 75 13                 jne    3d <...>
  2a: f0 0f b1 13           lock cmpxchg %edx,(%rbx)
  2e: 85 c0                 test   %eax,%eax
  30: 75 f2                 jne    24 <...>
...
  3d: f3 90                 pause
  3f: eb e3                 jmp    24 <...>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240422120054.199092-1-ubizjak@gmail.com
arch/x86/include/asm/qspinlock.h