Hou Tao says:
====================
The motivation of inlining bpf_kptr_xchg() comes from the performance
profiling of bpf memory allocator benchmark [1]. The benchmark uses
bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
objects for free. After inling bpf_kptr_xchg(), the performance for
object free on 8-CPUs VM increases about 2%~10%. However the performance
gain comes with costs: both the kasan and kcsan checks on the pointer
will be unavailable. Initially the inline is implemented in do_jit() for
x86-64 directly, but I think it will more portable to implement the
inline in verifier.
Patch #1 supports inlining bpf_kptr_xchg() helper and enables it on
x86-4. Patch #2 factors out a helper for newly-added test in patch #3.
Patch #3 tests whether the inlining of bpf_kptr_xchg() is expected.
Please see individual patches for more details. And comments are always
welcome.
Change Log:
v3:
* rebased on bpf-next tree
* patch 1 & 2: Add Rvb-by and Ack-by tags from Eduard
* patch 3: use inline assembly and naked function instead of c code
(suggested by Eduard)
v2: https://lore.kernel.org/bpf/
20231223104042.
1432300-1-houtao@huaweicloud.com/
* rebased on bpf-next tree
* drop patch #1 in v1 due to discussion in [2]
* patch #1: add the motivation in the commit message, merge patch #1
and #3 into the new patch in v2. (Daniel)
* patch #2/#3: newly-added patch to test the inlining of
bpf_kptr_xchg() (Eduard)
v1: https://lore.kernel.org/bpf/
95b8c2cd-44d5-5fe1-60b5-
7e8218779566@huaweicloud.com/
[1]: https://lore.kernel.org/bpf/
20231221141501.
3588586-1-houtao@huaweicloud.com/
[2]: https://lore.kernel.org/bpf/
fd94efb9-4a56-c982-dc2e-
c66be5202cb7@huaweicloud.com/
====================
Link: https://lore.kernel.org/r/20240105104819.3916743-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>