linux.git
17 months agoselftests/bpf: Expand sockaddr program return value tests
Jordan Rife [Fri, 10 May 2024 19:02:31 +0000 (14:02 -0500)]
selftests/bpf: Expand sockaddr program return value tests

This patch expands verifier coverage for program return values to cover
bind, connect, sendmsg, getsockname, and getpeername hooks. It also
rounds out the recvmsg coverage by adding test cases for recvmsg_unix
hooks.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-15-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Retire test_sock_addr.(c|sh)
Jordan Rife [Fri, 10 May 2024 19:02:30 +0000 (14:02 -0500)]
selftests/bpf: Retire test_sock_addr.(c|sh)

Fully remove test_sock_addr.c and test_sock_addr.sh, as test coverage
has been fully moved to prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-14-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Remove redundant sendmsg test cases
Jordan Rife [Fri, 10 May 2024 19:02:29 +0000 (14:02 -0500)]
selftests/bpf: Remove redundant sendmsg test cases

Remove these test cases completely, as the same behavior is already
covered by other sendmsg* test cases in prog_tests/sock_addr.c. This
just rewrites the destination address similar to sendmsg_v4_prog and
sendmsg_v6_prog.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-13-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Migrate ATTACH_REJECT test cases
Jordan Rife [Fri, 10 May 2024 19:02:28 +0000 (14:02 -0500)]
selftests/bpf: Migrate ATTACH_REJECT test cases

Migrate test case from bpf/test_sock_addr.c ensuring that program
attachment fails when using an inappropriate attach type.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-12-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Migrate expected_attach_type tests
Jordan Rife [Fri, 10 May 2024 19:02:27 +0000 (14:02 -0500)]
selftests/bpf: Migrate expected_attach_type tests

Migrates tests from progs/test_sock_addr.c ensuring that programs fail
to load when the expected attach type does not match.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-11-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Migrate wildcard destination rewrite test
Jordan Rife [Fri, 10 May 2024 19:02:26 +0000 (14:02 -0500)]
selftests/bpf: Migrate wildcard destination rewrite test

Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
respects when sendmsg6 hooks rewrite the destination IP with the IPv6
wildcard IP, [::].

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-10-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Migrate sendmsg6 v4 mapped address tests
Jordan Rife [Fri, 10 May 2024 19:02:25 +0000 (14:02 -0500)]
selftests/bpf: Migrate sendmsg6 v4 mapped address tests

Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
returns -ENOTSUPP when sending to an IPv4-mapped IPv6 address to
prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-9-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Migrate sendmsg deny test cases
Jordan Rife [Fri, 10 May 2024 19:02:24 +0000 (14:02 -0500)]
selftests/bpf: Migrate sendmsg deny test cases

This set of tests checks that sendmsg calls are rejected (return -EPERM)
when the sendmsg* hook returns 0. Replace those in bpf/test_sock_addr.c
with corresponding tests in prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-8-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Migrate WILDCARD_IP test
Jordan Rife [Fri, 10 May 2024 19:02:23 +0000 (14:02 -0500)]
selftests/bpf: Migrate WILDCARD_IP test

Move wildcard IP sendmsg test case out of bpf/test_sock_addr.c into
prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-7-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
Jordan Rife [Fri, 10 May 2024 19:02:22 +0000 (14:02 -0500)]
selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases

In preparation to move test cases from bpf/test_sock_addr.c that expect
system calls to return ENOTSUPP or EPERM, this patch propagates errno
from relevant system calls up to test_sock_addr() where the result can
be checked.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-6-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Handle ATTACH_REJECT test cases
Jordan Rife [Fri, 10 May 2024 19:02:21 +0000 (14:02 -0500)]
selftests/bpf: Handle ATTACH_REJECT test cases

In preparation to move test cases from bpf/test_sock_addr.c that expect
ATTACH_REJECT, this patch adds BPF_SKEL_FUNCS_RAW to generate load and
destroy functions that use bpf_prog_attach() to control the attach_type.

The normal load functions use bpf_program__attach_cgroup which does not
have the same degree of control over the attach type, as
bpf_program_attach_fd() calls bpf_link_create() with the attach type
extracted from prog using bpf_program__expected_attach_type(). It is
currently not possible to modify the attach type before
bpf_program__attach_cgroup() is called, since
bpf_program__set_expected_attach_type() has no effect after the program
is loaded.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-5-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Handle LOAD_REJECT test cases
Jordan Rife [Fri, 10 May 2024 19:02:20 +0000 (14:02 -0500)]
selftests/bpf: Handle LOAD_REJECT test cases

In preparation to move test cases from bpf/test_sock_addr.c that expect
LOAD_REJECT, this patch adds expected_attach_type and extends load_fn to
accept an expected attach type and a flag indicating whether or not
rejection is expected.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-4-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Use program name for skel load/destroy functions
Jordan Rife [Fri, 10 May 2024 19:02:19 +0000 (14:02 -0500)]
selftests/bpf: Use program name for skel load/destroy functions

In preparation to migrate tests from bpf/test_sock_addr.c to
sock_addr.c, update BPF_SKEL_FUNCS so that it generates functions
based on prog_name instead of skel_name. This allows us to differentiate
between programs in the same skeleton.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-3-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Migrate recvmsg* return code tests to verifier_sock_addr.c
Jordan Rife [Fri, 10 May 2024 19:02:18 +0000 (14:02 -0500)]
selftests/bpf: Migrate recvmsg* return code tests to verifier_sock_addr.c

This set of tests check that the BPF verifier rejects programs with
invalid return codes (recvmsg4 and recvmsg6 hooks can only return 1).
This patch replaces the tests in test_sock_addr.c with
verifier_sock_addr.c, a new verifier prog_tests for sockaddr hooks, in a
step towards fully retiring test_sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-2-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoriscv, bpf: make some atomic operations fully ordered
Puranjay Mohan [Sun, 5 May 2024 20:16:33 +0000 (20:16 +0000)]
riscv, bpf: make some atomic operations fully ordered

The BPF atomic operations with the BPF_FETCH modifier along with
BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements
all atomic operations except BPF_CMPXCHG with relaxed ordering.

Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic
Instructions" says:

| To provide more efficient support for release consistency [5], each
| atomic instruction has two bits, aq and rl, used to specify additional
| memory ordering constraints as viewed by other RISC-V harts.

and

| If only the aq bit is set, the atomic memory operation is treated as
| an acquire access.
| If only the rl bit is set, the atomic memory operation is treated as a
| release access.
|
| If both the aq and rl bits are set, the atomic memory operation is
| sequentially consistent.

Fix this by setting both aq and rl bits as 1 for operations with
BPF_FETCH and BPF_XCHG.

[1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoriscv, bpf: Fix typo in comment
Xiao Wang [Tue, 7 May 2024 11:16:18 +0000 (19:16 +0800)]
riscv, bpf: Fix typo in comment

We can use either "instruction" or "insn" in the comment.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agos390/bpf: Emit a barrier for BPF_FETCH instructions
Ilya Leoshkevich [Tue, 7 May 2024 00:02:49 +0000 (02:02 +0200)]
s390/bpf: Emit a barrier for BPF_FETCH instructions

BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH"
should be the same as atomic_fetch_add(), which is currently not the
case on s390x: the serialization instruction "bcr 14,0" is missing.
This applies to "and", "or" and "xor" variants too.

s390x is allowed to reorder stores with subsequent fetches from
different addresses, so code relying on BPF_FETCH acting as a barrier,
for example:

  stw [%r0], 1
  afadd [%r1], %r2
  ldxw %r3, [%r4]

may be broken. Fix it by emitting "bcr 14,0".

Note that a separate serialization instruction is not needed for
BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs
serialization itself.

Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops")
Reported-by: Puranjay Mohan <puranjay12@gmail.com>
Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoMerge branch 'bpf-inline-helpers-in-arm64-and-riscv-jits'
Alexei Starovoitov [Sun, 12 May 2024 23:54:34 +0000 (16:54 -0700)]
Merge branch 'bpf-inline-helpers-in-arm64-and-riscv-jits'

Puranjay Mohan says:

====================
bpf: Inline helpers in arm64 and riscv JITs

Changes in v5 -> v6:
arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/
riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
- Combine riscv and arm64 changes in single series
- Some coding style fixes

Changes in v4 -> v5:
v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/
- Implement the inlining of the bpf_get_smp_processor_id() in the JIT.

NOTE: This needs to be based on:
https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
to be built.

Manual run of bpf-ci with this series rebased on above:
https://github.com/kernel-patches/bpf/pull/6929

Changes in v3 -> v4:
v3: https://lore.kernel.org/all/20240426121349.97651-1-puranjay@kernel.org/
- Fix coding style issue related to C89 standards.

Changes in v2 -> v3:
v2: https://lore.kernel.org/all/20240424173550.16359-1-puranjay@kernel.org/
- Fixed the xlated dump of percpu mov to "r0 = &(void __percpu *)(r0)"
- Made ARM64 and x86-64 use the same code for inlining. The only difference
  that remains is the per-cpu address of the cpu_number.

Changes in v1 -> v2:
v1: https://lore.kernel.org/all/20240405091707.66675-1-puranjay12@gmail.com/
- Add a patch to inline bpf_get_smp_processor_id()
- Fix an issue in MRS instruction encoding as pointed out by Will
- Remove CONFIG_SMP check because arm64 kernel always compiles with CONFIG_SMP

This series adds the support of internal only per-CPU instructions and inlines
the bpf_get_smp_processor_id() helper call for ARM64 and RISC-V BPF JITs.

Here is an example of calls to bpf_get_smp_processor_id() and
percpu_array_map_lookup_elem() before and after this series on ARM64.

                                         BPF
                                        =====
              BEFORE                                       AFTER
             --------                                     -------

int cpu = bpf_get_smp_processor_id();           int cpu = bpf_get_smp_processor_id();
(85) call bpf_get_smp_processor_id#229032       (85) call bpf_get_smp_processor_id#8

p = bpf_map_lookup_elem(map, &zero);            p = bpf_map_lookup_elem(map, &zero);
(18) r1 = map[id:78]                            (18) r1 = map[id:153]
(18) r2 = map[id:82][0]+65536                   (18) r2 = map[id:157][0]+65536
(85) call percpu_array_map_lookup_elem#313512   (07) r1 += 496
                                                (61) r0 = *(u32 *)(r2 +0)
                                                (35) if r0 >= 0x1 goto pc+5
                                                (67) r0 <<= 3
                                                (0f) r0 += r1
                                                (79) r0 = *(u64 *)(r0 +0)
                                                (bf) r0 = &(void __percpu *)(r0)
                                                (05) goto pc+1
                                                (b7) r0 = 0

                                      ARM64 JIT
                                     ===========

              BEFORE                                       AFTER
             --------                                     -------

int cpu = bpf_get_smp_processor_id();           int cpu = bpf_get_smp_processor_id();
mov     x10, #0xfffffffffffff4d0                mrs     x10, sp_el0
movk    x10, #0x802b, lsl #16                   ldr     w7, [x10, #24]
movk    x10, #0x8000, lsl #32
blr     x10
add     x7, x0, #0x0

p = bpf_map_lookup_elem(map, &zero);            p = bpf_map_lookup_elem(map, &zero);
mov     x0, #0xffff0003ffffffff                 mov     x0, #0xffff0003ffffffff
movk    x0, #0xce5c, lsl #16                    movk    x0, #0xe0f3, lsl #16
movk    x0, #0xca00                             movk    x0, #0x7c00
mov     x1, #0xffff8000ffffffff                 mov     x1, #0xffff8000ffffffff
movk    x1, #0x8bdb, lsl #16                    movk    x1, #0xb0c7, lsl #16
movk    x1, #0x6000                             movk    x1, #0xe000
mov     x10, #0xffffffffffff3ed0                add     x0, x0, #0x1f0
movk    x10, #0x802d, lsl #16                   ldr     w7, [x1]
movk    x10, #0x8000, lsl #32                   cmp     x7, #0x1
blr     x10                                     b.cs    0x0000000000000090
add     x7, x0, #0x0                            lsl     x7, x7, #3
                                                add     x7, x7, x0
                                                ldr     x7, [x7]
                                                mrs     x10, tpidr_el1
                                                add     x7, x7, x10
                                                b       0x0000000000000094
                                                mov     x7, #0x0

              Performance improvement found using benchmark[1]

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

  +---------------+-------------------+-------------------+--------------+
  |      Name     |      Before       |        After      |   % change   |
  |---------------+-------------------+-------------------+--------------|
  | glob-arr-inc  | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s |   + 10.74%   |
  | arr-inc       | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s |   + 5.37%    |
  | hash-inc      | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s |   + 2.08%    |
  +---------------+-------------------+-------------------+--------------+

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

             RISCV64 JIT output for `call bpf_get_smp_processor_id`
            =======================================================

                  Before                           After
                 --------                         -------

           auipc   t1,0x848c                  ld    a5,32(tp)
           jalr    604(t1)
           mv      a5,a0

  Benchmark using [1] on Qemu.

  ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

  +---------------+------------------+------------------+--------------+
  |      Name     |     Before       |       After      |   % change   |
  |---------------+------------------+------------------+--------------|
  | glob-arr-inc  | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s |   + 24.04%   |
  | arr-inc       | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s |   + 23.56%   |
  | hash-inc      | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s |   + 32.18%   |
  +---------------+------------------+------------------+--------------+
====================

Link: https://lore.kernel.org/r/20240502151854.9810-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf, arm64: inline bpf_get_smp_processor_id() helper
Puranjay Mohan [Thu, 2 May 2024 15:18:54 +0000 (15:18 +0000)]
bpf, arm64: inline bpf_get_smp_processor_id() helper

Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting
a read from struct thread_info. The SP_EL0 system register holds the
pointer to the task_struct and thread_info is the first member of this
struct. We can read the cpu number from the thread_info.

Here is how the ARM64 JITed assembly changes after this commit:

                                      ARM64 JIT
                                     ===========

              BEFORE                                    AFTER
             --------                                  -------

int cpu = bpf_get_smp_processor_id();        int cpu = bpf_get_smp_processor_id();

mov     x10, #0xfffffffffffff4d0             mrs     x10, sp_el0
movk    x10, #0x802b, lsl #16                ldr     w7, [x10, #24]
movk    x10, #0x8000, lsl #32
blr     x10
add     x7, x0, #0x0

               Performance improvement using benchmark[1]

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

+---------------+-------------------+-------------------+--------------+
|      Name     |      Before       |        After      |   % change   |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc  | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s |   + 10.74%   |
| arr-inc       | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s |   + 5.37%    |
| hash-inc      | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s |   + 2.08%    |
+---------------+-------------------+-------------------+--------------+

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoarm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs
Puranjay Mohan [Thu, 2 May 2024 15:18:53 +0000 (15:18 +0000)]
arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs

Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.

Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.

To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:

mov dst, src // Move src to dst, if src != dst
mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp // Add the per cpu offset to the dst.

To measure the performance improvement provided by this change, the
benchmark in [1] was used:

Before:
glob-arr-inc   :   23.597 ± 0.012M/s
arr-inc        :   23.173 ± 0.019M/s
hash-inc       :   12.186 ± 0.028M/s

After:
glob-arr-inc   :   23.819 ± 0.034M/s
arr-inc        :   23.285 ± 0.017M/s
hash-inc       :   12.419 ± 0.011M/s

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoriscv, bpf: inline bpf_get_smp_processor_id()
Puranjay Mohan [Thu, 2 May 2024 15:18:52 +0000 (15:18 +0000)]
riscv, bpf: inline bpf_get_smp_processor_id()

Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.

RISCV saves the pointer to the CPU's task_struct in the TP (thread
pointer) register. This makes it trivial to get the CPU's processor id.
As thread_info is the first member of task_struct, we can read the
processor id from TP + offsetof(struct thread_info, cpu).

          RISCV64 JIT output for `call bpf_get_smp_processor_id`
  ======================================================

                Before                           After
               --------                         -------

         auipc   t1,0x848c                  ld    a5,32(tp)
         jalr    604(t1)
         mv      a5,a0

Benchmark using [1] on Qemu.

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

+---------------+------------------+------------------+--------------+
|      Name     |     Before       |       After      |   % change   |
|---------------+------------------+------------------+--------------|
| glob-arr-inc  | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s |   + 24.04%   |
| arr-inc       | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s |   + 23.56%   |
| hash-inc      | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s |   + 32.18%   |
+---------------+------------------+------------------+--------------+

NOTE: This benchmark includes changes from this patch and the previous
      patch that implemented the per-cpu insn.

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoriscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs
Puranjay Mohan [Thu, 2 May 2024 15:18:51 +0000 (15:18 +0000)]
riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs

Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.

RISC-V uses generic per-cpu implementation where the offsets for CPUs
are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores
the address of the task_struct in TP register. The first element in
task_struct is struct thread_info, and we can get the cpu number by
reading from the TP register + offsetof(struct thread_info, cpu).

Once we have the cpu number in a register we read the offset for that
cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this
offset to the destination register.

To measure the improvement from this change, the benchmark in [1] was
used on Qemu:

Before:
glob-arr-inc   :    1.127 ± 0.013M/s
arr-inc        :    1.121 ± 0.004M/s
hash-inc       :    0.681 ± 0.052M/s

After:
glob-arr-inc   :    1.138 ± 0.011M/s
arr-inc        :    1.366 ± 0.006M/s
hash-inc       :    0.676 ± 0.001M/s

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoARC: Add eBPF JIT support
Shahab Vahedi [Tue, 30 Apr 2024 14:56:04 +0000 (16:56 +0200)]
ARC: Add eBPF JIT support

This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:

test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120  224  260 PASS

test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1  23 PASS

test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS

Deployment and structure
------------------------
The related codes are added to "arch/arc/net":

- bpf_jit.h       -- The interface that a back-end translator must provide
- bpf_jit_core.c  -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic

The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".

1. Normal pass       # The necessary pass
     1a. Dry run       # Get the whole JIT length, epilogue offset, etc.
     1b. Emit phase    # Allocate memory and start emitting instructions
2. Extra pass        # Only needed if there are relocations to be fixed
     2a. Patch relocations

Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:

- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)

The result of "test_bpf" test suite on an HSDK board is:

hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf

  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:

  .-----------.------------.-------------.
  | test type |   opcodes  | # of cases  |
  |-----------+------------+-------------|
  | atomic    | 0xC3, 0xDB |         149 |
  | div64     | 0x37, 0x3F |          22 |
  | mod64     | 0x97, 0x9F |          15 |
  `-----------^------------+-------------|
                           | (total) 186 |
                           `-------------'

Setup: build config
-------------------
The following configs must be set to have a working JIT test:

  CONFIG_BPF_JIT=y
  CONFIG_BPF_JIT_ALWAYS_ON=y
  CONFIG_TEST_BPF=m

The following options are not necessary for the tests module,
but are good to have:

  CONFIG_DEBUG_INFO=y             # prerequisite for below
  CONFIG_DEBUG_INFO_BTF=y         # so bpftool can generate vmlinux.h

  CONFIG_FTRACE=y                 #
  CONFIG_BPF_SYSCALL=y            # all these options lead to
  CONFIG_KPROBE_EVENTS=y          # having CONFIG_BPF_EVENTS=y
  CONFIG_PERF_EVENTS=y            #

Some BPF programs provide data through /sys/kernel/debug:
  CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug

Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.

[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7

Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:

pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats

Or else, the build will fail:

$ make V=1
  ...
  BTF     .btf.vmlinux.bin.o
pahole -J --btf_gen_floats                    \
       -j --lang_exclude=rust                 \
       --skip_encoding_btf_inconsistent_proto \
       --btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
  ...
  BTFIDS  vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available

This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.

Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
      ...
      test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
      test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support

Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agokbuild,bpf: Switch to using --btf_features for pahole v1.26 and later
Alan Maguire [Tue, 7 May 2024 13:55:14 +0000 (14:55 +0100)]
kbuild,bpf: Switch to using --btf_features for pahole v1.26 and later

The btf_features list can be used for pahole v1.26 and later -
it is useful because if a feature is not yet implemented it will
not exit with a failure message.  This will allow us to add feature
requests to the pahole options without having to check pahole versions
in future; if the version of pahole supports the feature it will be
added.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240507135514.490467-1-alan.maguire@oracle.com
17 months agoMerge branch 'use network helpers, part 4'
Martin KaFai Lau [Thu, 9 May 2024 19:34:08 +0000 (12:34 -0700)]
Merge branch 'use network helpers, part 4'

Geliang Tang says:

====================
From: Geliang Tang <tanggeliang@kylinos.cn>

This patchset adds post_socket_cb pointer into
struct network_helper_opts to make start_server_addr() helper
more flexible. With these modifications, many duplicate codes
can be dropped.

Patches 1-3 address Martin's comments in the previous series.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Drop get_port in test_tcp_check_syncookie
Geliang Tang [Sun, 5 May 2024 11:35:13 +0000 (19:35 +0800)]
selftests/bpf: Drop get_port in test_tcp_check_syncookie

The arguments "addr" and "len" of run_test() have dropped. This makes
function get_port() useless. Drop it from test_tcp_check_syncookie_user.c.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/a9b5c8064ab4cbf0f68886fe0e4706428b8d0d47.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Use connect_to_fd in test_tcp_check_syncookie
Geliang Tang [Sun, 5 May 2024 11:35:12 +0000 (19:35 +0800)]
selftests/bpf: Use connect_to_fd in test_tcp_check_syncookie

This patch uses public helper connect_to_fd() exported in network_helpers.h
instead of the local defined function connect_to_server() in
test_tcp_check_syncookie_user.c. This can avoid duplicate code.

Then the arguments "addr" and "len" of run_test() become useless, drop them
too.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/e0ae6b790ac0abc7193aadfb2660c8c9eb0fe1f0.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Use connect_to_fd in sockopt_inherit
Geliang Tang [Sun, 5 May 2024 11:35:11 +0000 (19:35 +0800)]
selftests/bpf: Use connect_to_fd in sockopt_inherit

This patch uses public helper connect_to_fd() exported in network_helpers.h
instead of the local defined function connect_to_server() in
prog_tests/sockopt_inherit.c. This can avoid duplicate code.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/71db79127cc160b0643fd9a12c70ae019ae076a1.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Use start_server_addr in test_tcp_check_syncookie
Geliang Tang [Sun, 5 May 2024 11:35:10 +0000 (19:35 +0800)]
selftests/bpf: Use start_server_addr in test_tcp_check_syncookie

Include network_helpers.h in test_tcp_check_syncookie_user.c, use
public helper start_server_addr() in it instead of the local defined
function start_server(). This can avoid duplicate code.

Add two helpers v6only_true() and v6only_false() to set IPV6_V6ONLY
sockopt to true or false, set them to post_socket_cb pointer of struct
network_helper_opts, and pass it to start_server_setsockopt().

In order to use functions defined in network_helpers.c, Makefile needs
to be updated too.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/e0c5324f5da84f453f47543536e70f126eaa8678.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Use start_server_addr in sockopt_inherit
Geliang Tang [Sun, 5 May 2024 11:35:09 +0000 (19:35 +0800)]
selftests/bpf: Use start_server_addr in sockopt_inherit

Include network_helpers.h in prog_tests/sockopt_inherit.c, use public
helper start_server_addr() instead of the local defined function
start_server(). This can avoid duplicate code.

Add a helper custom_cb() to set SOL_CUSTOM sockopt looply, set it to
post_socket_cb pointer of struct network_helper_opts, and pass it to
start_server_addr().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/687af66f743a0bf15cdba372c5f71fe64863219e.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Add post_socket_cb for network_helper_opts
Geliang Tang [Sun, 5 May 2024 11:35:08 +0000 (19:35 +0800)]
selftests/bpf: Add post_socket_cb for network_helper_opts

__start_server() sets SO_REUSPORT through setsockopt() when the parameter
'reuseport' is set. This patch makes it more flexible by adding a function
pointer post_socket_cb into struct network_helper_opts. The
'const struct post_socket_opts *cb_opts' args in the post_socket_cb is
for the future extension.

The 'reuseport' parameter can be dropped.
Now the original start_reuseport_server() can be implemented by setting a
newly defined reuseport_cb() function pointer to post_socket_cb filed of
struct network_helper_opts.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/470cb82f209f055fc7fb39c66c6b090b5b7ed2b2.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoMerge branch 'selftests-bpf-retire-bpf_tcp_helpers-h'
Alexei Starovoitov [Thu, 9 May 2024 18:13:12 +0000 (11:13 -0700)]
Merge branch 'selftests-bpf-retire-bpf_tcp_helpers-h'

Martin KaFai Lau says:

====================
selftests/bpf: Retire bpf_tcp_helpers.h

From: Martin KaFai Lau <martin.lau@kernel.org>

The earlier commit 8e6d9ae2e09f ("selftests/bpf: Use bpf_tracing.h instead of bpf_tcp_helpers.h")
removed the bpf_tcp_helpers.h usages from the non networking tests.

This patch set is a continuation of this effort to retire
the bpf_tcp_helpers.h from the networking tests (mostly tcp-cc related).

The main usage of the bpf_tcp_helpers.h is the partial kernel
socket definitions (e.g. sock, tcp_sock). New fields are kept adding
back to those partial socket definitions while everything is available
in the vmlinux.h. The recent bpf_cc_cubic.c test tried to extend
bpf_tcp_helpers.c but eventually used the vmlinux.h instead. To avoid
this unnecessary detour for new tests and have one consistent way
of using the kernel sockets, this patch set retires the bpf_tcp_helpers.h
usages and consolidates the tests to use vmlinux.h instead.
====================

Link: https://lore.kernel.org/r/20240509175026.3423614-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Retire bpf_tcp_helpers.h
Martin KaFai Lau [Thu, 9 May 2024 17:50:26 +0000 (10:50 -0700)]
selftests/bpf: Retire bpf_tcp_helpers.h

The previous patches have consolidated the tests to use
bpf_tracing_net.h (i.e. vmlinux.h) instead of bpf_tcp_helpers.h.

This patch can finally retire the bpf_tcp_helpers.h from
the repository.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-11-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Remove the bpf_tcp_helpers.h usages from other non tcp-cc tests
Martin KaFai Lau [Thu, 9 May 2024 17:50:25 +0000 (10:50 -0700)]
selftests/bpf: Remove the bpf_tcp_helpers.h usages from other non tcp-cc tests

The patch removes the remaining bpf_tcp_helpers.h usages in the
non tcp-cc networking tests. It either replaces it with bpf_tracing_net.h
or just removed it because the test is not actually using any
kernel sockets. For the later, the missing macro (mainly SOL_TCP) is
defined locally.

An exception is the test_sock_fields which is testing
the "struct bpf_sock" type instead of the kernel sock type.
Whenever "vmlinux.h" is used instead, it hits a verifier
error on doing arithmetic on the sock_common pointer:

; return !a6[0] && !a6[1] && !a6[2] && a6[3] == bpf_htonl(1); @ test_sock_fields.c:54
21: (61) r2 = *(u32 *)(r1 +28)        ; R1_w=sock_common() R2_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
22: (56) if w2 != 0x0 goto pc-6       ; R2_w=0
23: (b7) r3 = 28                      ; R3_w=28
24: (bf) r2 = r1                      ; R1_w=sock_common() R2_w=sock_common()
25: (0f) r2 += r3
R2 pointer arithmetic on sock_common prohibited

Hence, instead of including bpf_tracing_net.h, the test_sock_fields test
defines a tcp_sock with one lsndtime field in it.

Another highlight is, in sockopt_qos_to_cc.c, the tcp_cc_eq()
is replaced by bpf_strncmp(). tcp_cc_eq() was a workaround
in bpf_tcp_helpers.h before bpf_strncmp had been added.

The SOL_IPV6 addition to bpf_tracing_net.h is needed by the
test_tcpbpf_kern test.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-10-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Remove bpf_tcp_helpers.h usages from other misc bpf tcp-cc tests
Martin KaFai Lau [Thu, 9 May 2024 17:50:24 +0000 (10:50 -0700)]
selftests/bpf: Remove bpf_tcp_helpers.h usages from other misc bpf tcp-cc tests

This patch removed the final few bpf_tcp_helpers.h usages
in some misc bpf tcp-cc tests and replace it with
bpf_tracing_net.h (i.e. vmlinux.h)

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Use bpf_tracing_net.h in bpf_dctcp
Martin KaFai Lau [Thu, 9 May 2024 17:50:23 +0000 (10:50 -0700)]
selftests/bpf: Use bpf_tracing_net.h in bpf_dctcp

This patch uses bpf_tracing_net.h (i.e. vmlinux.h) in bpf_dctcp.
This will allow to retire the bpf_tcp_helpers.h and consolidate
tcp-cc tests to vmlinux.h.

It will have a dup on min/max macros with the bpf_cubic. It could
be further refactored in the future.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-8-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Use bpf_tracing_net.h in bpf_cubic
Martin KaFai Lau [Thu, 9 May 2024 17:50:22 +0000 (10:50 -0700)]
selftests/bpf: Use bpf_tracing_net.h in bpf_cubic

This patch uses bpf_tracing_net.h (i.e. vmlinux.h) in bpf_cubic.
This will allow to retire the bpf_tcp_helpers.h and consolidate
tcp-cc tests to vmlinux.h.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-7-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Rename tcp-cc private struct in bpf_cubic and bpf_dctcp
Martin KaFai Lau [Thu, 9 May 2024 17:50:21 +0000 (10:50 -0700)]
selftests/bpf: Rename tcp-cc private struct in bpf_cubic and bpf_dctcp

The "struct bictcp" and "struct dctcp" are private to the bpf prog
and they are stored in the private buffer in inet_csk(sk)->icsk_ca_priv.
Hence, there is no bpf CO-RE required.

The same struct name exists in the vmlinux.h. To reuse vmlinux.h,
they need to be renamed such that the bpf prog logic will be
immuned from the kernel tcp-cc changes.

This patch adds a "bpf_" prefix to them.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Sanitize the SEC and inline usages in the bpf-tcp-cc tests
Martin KaFai Lau [Thu, 9 May 2024 17:50:20 +0000 (10:50 -0700)]
selftests/bpf: Sanitize the SEC and inline usages in the bpf-tcp-cc tests

It is needed to remove the BPF_STRUCT_OPS usages from the tcp-cc tests
because it is defined in bpf_tcp_helpers.h which is going to be retired.
While at it, this patch consolidates all tcp-cc struct_ops programs to
use the SEC("struct_ops") + BPF_PROG().

It also removes the unnecessary __always_inline usages from the
tcp-cc tests.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Reuse the tcp_sk() from the bpf_tracing_net.h
Martin KaFai Lau [Thu, 9 May 2024 17:50:19 +0000 (10:50 -0700)]
selftests/bpf: Reuse the tcp_sk() from the bpf_tracing_net.h

This patch removes the individual tcp_sk implementations from the
tcp-cc tests. The tcp_sk() implementation from the bpf_tracing_net.h
is reused instead.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Add a few tcp helper functions and macros to bpf_tracing_net.h
Martin KaFai Lau [Thu, 9 May 2024 17:50:18 +0000 (10:50 -0700)]
selftests/bpf: Add a few tcp helper functions and macros to bpf_tracing_net.h

This patch adds a few tcp related helper functions to bpf_tracing_net.h.
They will be useful for both tcp-cc and network tracing related
bpf progs. They have already been in the bpf_tcp_helpers.h. This change
is needed to retire the bpf_tcp_helpers.h and consolidate all tests
to vmlinux.h (i.e. bpf_tracing_net.h).

Some of the helpers (tcp_sk and inet_csk) are also defined in
bpf_cc_cubic.c and they are removed. While at it, remove
the vmlinux.h from bpf_cc_cubic.c. bpf_tracing_net.h (which has
vmlinux.h after this patch) is enough and will be consistent
with the other tcp-cc tests in the later patches.

The other TCP_* macro additions will be needed for the bpf_dctcp
changes in the later patch.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: Remove bpf_tracing_net.h usages from two networking tests
Martin KaFai Lau [Thu, 9 May 2024 17:50:17 +0000 (10:50 -0700)]
selftests/bpf: Remove bpf_tracing_net.h usages from two networking tests

This patch removes the bpf_tracing_net.h usage from the networking tests,
fib_lookup and test_lwt_redirect. Instead of using the (copied) macro
TC_ACT_SHOT and ETH_HLEN from bpf_tracing_net.h, they can directly
use the ones defined in the network header files under linux/.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509175026.3423614-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf: Avoid uninitialized value in BPF_CORE_READ_BITFIELD
Jose E. Marchesi [Wed, 8 May 2024 10:13:13 +0000 (12:13 +0200)]
bpf: Avoid uninitialized value in BPF_CORE_READ_BITFIELD

[Changes from V1:
 - Use a default branch in the switch statement to initialize `val'.]

GCC warns that `val' may be used uninitialized in the
BPF_CRE_READ_BITFIELD macro, defined in bpf_core_read.h as:

[...]
unsigned long long val;       \
[...]       \
switch (__CORE_RELO(s, field, BYTE_SIZE)) {       \
case 1: val = *(const unsigned char *)p; break;       \
case 2: val = *(const unsigned short *)p; break;       \
case 4: val = *(const unsigned int *)p; break;       \
case 8: val = *(const unsigned long long *)p; break;       \
        }              \
[...]
val;       \
}       \

This patch adds a default entry in the switch statement that sets
`val' to zero in order to avoid the warning, and random values to be
used in case __builtin_preserve_field_info returns unexpected values
for BPF_FIELD_BYTE_SIZE.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240508101313.16662-1-jose.marchesi@oracle.com
17 months agobpf: guard BPF_NO_PRESERVE_ACCESS_INDEX in skb_pkt_end.c
Jose E. Marchesi [Wed, 8 May 2024 11:03:32 +0000 (13:03 +0200)]
bpf: guard BPF_NO_PRESERVE_ACCESS_INDEX in skb_pkt_end.c

This little patch is a follow-up to:
https://lore.kernel.org/bpf/20240507095011.15867-1-jose.marchesi@oracle.com/T/#u

The temporary workaround of passing -DBPF_NO_PRESERVE_ACCESS_INDEX
when building with GCC triggers a redefinition preprocessor error when
building progs/skb_pkt_end.c.  This patch adds a guard to avoid
redefinition.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240508110332.17332-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf: avoid UB in usages of the __imm_insn macro
Jose E. Marchesi [Wed, 8 May 2024 10:35:51 +0000 (12:35 +0200)]
bpf: avoid UB in usages of the __imm_insn macro

[Changes from V2:
 - no-strict-aliasing is only applied when building with GCC.
 - cpumask_failure.c is excluded, as it doesn't use __imm_insn.]

The __imm_insn macro is defined in bpf_misc.h as:

  #define __imm_insn(name, expr) [name]"i"(*(long *)&(expr))

This may lead to type-punning and strict aliasing rules violations in
it's typical usage where the address of a struct bpf_insn is passed as
expr, like in:

  __imm_insn(st_mem,
             BPF_ST_MEM(BPF_W, BPF_REG_1, offsetof(struct __sk_buff, mark), 42))

Where:

  #define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
((struct bpf_insn) { \
.code  = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = 0, \
.off   = OFF, \
.imm   = IMM })

In all the actual instances of this in the BPF selftests the value is
fed to a volatile asm statement as soon as it gets read from memory,
and thus it is unlikely anti-aliasing rules breakage may lead to
misguided optimizations.

However, GCC detects the potential problem (indirectly) by issuing a
warning stating that a temporary <Uxxxxxx> is used uninitialized,
where the temporary corresponds to the memory read by *(long *).

This patch adds -fno-strict-aliasing to the compilation flags of the
particular selftests that do type punning via __imm_insn, only for
GCC.

Tested in master bpf-next.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240508103551.14955-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf: avoid uninitialized warnings in verifier_global_subprogs.c
Jose E. Marchesi [Tue, 7 May 2024 18:47:56 +0000 (20:47 +0200)]
bpf: avoid uninitialized warnings in verifier_global_subprogs.c

[Changes from V1:
- The warning to disable is -Wmaybe-uninitialized, not -Wuninitialized.
- This warning is only supported in GCC.]

The BPF selftest verifier_global_subprogs.c contains code that
purposedly performs out of bounds access to memory, to check whether
the kernel verifier is able to catch them.  For example:

  __noinline int global_unsupp(const int *mem)
  {
if (!mem)
return 0;
return mem[100]; /* BOOM */
  }

With -O1 and higher and no inlining, GCC notices this fact and emits a
"maybe uninitialized" warning.  This is by design.  Note that the
emission of these warnings is highly dependent on the precise
optimizations that are performed.

This patch adds a compiler pragma to verifier_global_subprogs.c to
ignore these warnings.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240507184756.1772-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf, arm64: Add support for lse atomics in bpf_arena
Puranjay Mohan [Fri, 26 Apr 2024 16:11:16 +0000 (16:11 +0000)]
bpf, arm64: Add support for lse atomics in bpf_arena

When LSE atomics are available, BPF atomic instructions are implemented
as single ARM64 atomic instructions, therefore it is easy to enable
these in bpf_arena using the currently available exception handling
setup.

LL_SC atomics use loops and therefore would need more work to enable in
bpf_arena.

Enable LSE atomics based instructions in bpf_arena and use the
bpf_jit_supports_insn() callback to reject atomics in bpf_arena if LSE
atomics are not available.

All atomics and arena_atomics selftests are passing:

  [root@ip-172-31-2-216 bpf]# ./test_progs -a atomics,arena_atomics
  #3/1     arena_atomics/add:OK
  #3/2     arena_atomics/sub:OK
  #3/3     arena_atomics/and:OK
  #3/4     arena_atomics/or:OK
  #3/5     arena_atomics/xor:OK
  #3/6     arena_atomics/cmpxchg:OK
  #3/7     arena_atomics/xchg:OK
  #3       arena_atomics:OK
  #10/1    atomics/add:OK
  #10/2    atomics/sub:OK
  #10/3    atomics/and:OK
  #10/4    atomics/or:OK
  #10/5    atomics/xor:OK
  #10/6    atomics/cmpxchg:OK
  #10/7    atomics/xchg:OK
  #10      atomics:OK
  Summary: 2/14 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240426161116.441-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoMerge branch 'libbpf: further struct_ops fixes and improvements'
Martin KaFai Lau [Tue, 7 May 2024 23:22:00 +0000 (16:22 -0700)]
Merge branch 'libbpf: further struct_ops fixes and improvements'

Andrii Nakryiko says:

====================
Fix yet another case of mishandling SEC("struct_ops") programs that were
nulled out programmatically through BPF skeleton by the user.

While at it, add some improvements around detecting and reporting errors,
specifically a common case of declaring SEC("struct_ops") program, but
forgetting to actually make use of it by setting it as a callback
implementation in SEC(".struct_ops") variable (i.e., map) declaration.

A bunch of new selftests are added as well.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: shorten subtest names for struct_ops_module test
Andrii Nakryiko [Tue, 7 May 2024 00:13:35 +0000 (17:13 -0700)]
selftests/bpf: shorten subtest names for struct_ops_module test

Drive-by clean up, we shouldn't use meaningless "test_" prefix for
subtest names.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240507001335.1445325-8-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: validate struct_ops early failure detection logic
Andrii Nakryiko [Tue, 7 May 2024 00:13:34 +0000 (17:13 -0700)]
selftests/bpf: validate struct_ops early failure detection logic

Add a simple test that validates that libbpf will reject isolated
struct_ops program early with helpful warning message.

Also validate that explicit use of such BPF program through BPF skeleton
after BPF object is open won't trigger any warnings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240507001335.1445325-7-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agolibbpf: improve early detection of doomed-to-fail BPF program loading
Andrii Nakryiko [Tue, 7 May 2024 00:13:33 +0000 (17:13 -0700)]
libbpf: improve early detection of doomed-to-fail BPF program loading

Extend libbpf's pre-load checks for BPF programs, detecting more typical
conditions that are destinated to cause BPF program failure. This is an
opportunity to provide more helpful and actionable error message to
users, instead of potentially very confusing BPF verifier log and/or
error.

In this case, we detect struct_ops BPF program that was not referenced
anywhere, but still attempted to be loaded (according to libbpf logic).
Suggest that the program might need to be used in some struct_ops
variable. User will get a message of the following kind:

  libbpf: prog 'test_1_forgotten': SEC("struct_ops") program isn't referenced anywhere, did you forget to use it?

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240507001335.1445325-6-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agolibbpf: fix libbpf_strerror_r() handling unknown errors
Andrii Nakryiko [Tue, 7 May 2024 00:13:32 +0000 (17:13 -0700)]
libbpf: fix libbpf_strerror_r() handling unknown errors

strerror_r(), used from libbpf-specific libbpf_strerror_r() wrapper is
documented to return error in two different ways, depending on glibc
version. Take that into account when handling strerror_r()'s own errors,
which happens when we pass some non-standard (internal) kernel error to
it. Before this patch we'd have "ERROR: strerror_r(524)=22", which is
quite confusing. Now for the same situation we'll see a bit less
visually scary "unknown error (-524)".

At least we won't confuse user with irrelevant EINVAL (22).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240507001335.1445325-5-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: add another struct_ops callback use case test
Andrii Nakryiko [Tue, 7 May 2024 00:13:31 +0000 (17:13 -0700)]
selftests/bpf: add another struct_ops callback use case test

Add a test which tests the case that was just fixed. Kernel has full
type information about callback, but user explicitly nulls out the
reference to declaratively set BPF program reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240507001335.1445325-4-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agolibbpf: handle yet another corner case of nulling out struct_ops program
Andrii Nakryiko [Tue, 7 May 2024 00:13:30 +0000 (17:13 -0700)]
libbpf: handle yet another corner case of nulling out struct_ops program

There is yet another corner case where user can set STRUCT_OPS program
reference in STRUCT_OPS map to NULL, but libbpf will fail to disable
autoload for such BPF program. This time it's the case of "new" kernel
which has type information about callback field, but user explicitly
nulled-out program reference from user-space after opening BPF object.

Fix, hopefully, the last remaining unhandled case.

Fixes: 0737df6de946 ("libbpf: better fix for handling nulled-out struct_ops program")
Fixes: f973fccd43d3 ("libbpf: handle nulled-out program in struct_ops correctly")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240507001335.1445325-3-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agolibbpf: remove unnecessary struct_ops prog validity check
Andrii Nakryiko [Tue, 7 May 2024 00:13:29 +0000 (17:13 -0700)]
libbpf: remove unnecessary struct_ops prog validity check

libbpf ensures that BPF program references set in map->st_ops->progs[i]
during open phase are always valid STRUCT_OPS programs. This is done in
bpf_object__collect_st_ops_relos(). So there is no need to double-check
that in bpf_map__init_kern_struct_ops().

Simplify the code by removing unnecessary check. Also, we avoid using
local prog variable to keep code similar to the upcoming fix, which adds
similar logic in another part of bpf_map__init_kern_struct_ops().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240507001335.1445325-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoMerge branch 'fix-number-of-arguments-in-test'
Andrii Nakryiko [Tue, 7 May 2024 21:41:00 +0000 (14:41 -0700)]
Merge branch 'fix-number-of-arguments-in-test'

Cupertino Miranda says:

====================
Fix number of arguments in test

Hi everyone,

This is a new version based on comments.

Regards,
Cupertino

Changes from v1:
 - Comment with gcc-bpf replaced by bpf_gcc.
 - Used pragma GCC optimize to disable GCC optimization in test.

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
====================

Link: https://lore.kernel.org/r/20240507122220.207820-1-cupertino.miranda@oracle.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
17 months agoselftests/bpf: Change functions definitions to support GCC
Cupertino Miranda [Tue, 7 May 2024 12:22:20 +0000 (13:22 +0100)]
selftests/bpf: Change functions definitions to support GCC

The test_xdp_noinline.c contains 2 functions that use more then 5
arguments. This patch collapses the 2 last arguments in an array.
Also in GCC and ipa_sra optimization increases the number of arguments
used in function encap_v4. This pass disables the optimization for that
particular file.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240507122220.207820-3-cupertino.miranda@oracle.com
17 months agoselftests/bpf: Add CFLAGS per source file and runner
Cupertino Miranda [Tue, 7 May 2024 12:22:19 +0000 (13:22 +0100)]
selftests/bpf: Add CFLAGS per source file and runner

This patch adds support to specify CFLAGS per source file and per test
runner.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240507122220.207820-2-cupertino.miranda@oracle.com
17 months agobpf: Temporarily define BPF_NO_PRESEVE_ACCESS_INDEX for GCC
Jose E. Marchesi [Tue, 7 May 2024 09:50:11 +0000 (11:50 +0200)]
bpf: Temporarily define BPF_NO_PRESEVE_ACCESS_INDEX for GCC

The vmlinux.h file generated by bpftool makes use of compiler pragmas
in order to install the CO-RE preserve_access_index in all the struct
types derived from the BTF info:

  #ifndef __VMLINUX_H__
  #define __VMLINUX_H__

  #ifndef BPF_NO_PRESERVE_ACCESS_INDEX
  #pragma clang attribute push (__attribute__((preserve_access_index)), apply_t = record
  #endif

  [... type definitions generated from kernel BTF ... ]

  #ifndef BPF_NO_PRESERVE_ACCESS_INDEX
  #pragma clang attribute pop
  #endif

The `clang attribute push/pop' pragmas are specific to clang/llvm and
are not supported by GCC.

At the moment the BTF dumping services in libbpf do not support
dicriminating between types dumped because they are directly referred
and types dumped because they are dependencies.  A suitable API is
being worked now. See [1] and [2].

In the interim, this patch changes the selftests/bpf Makefile so it
passes -DBPF_NO_PRESERVE_ACCESS_INDEX to GCC when it builds the
selftests.  This workaround is temporary, and may have an impact on
the results of the GCC-built tests.

[1] https://lore.kernel.org/bpf/20240503111836.25275-1-jose.marchesi@oracle.com/T/#u
[2] https://lore.kernel.org/bpf/20240504205510.24785-1-jose.marchesi@oracle.com/T/#u

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240507095011.15867-1-jose.marchesi@oracle.com
17 months agoMerge branch 'bpf-avoid-attribute-ignored-warnings-in-gcc'
Andrii Nakryiko [Tue, 7 May 2024 21:31:20 +0000 (14:31 -0700)]
Merge branch 'bpf-avoid-attribute-ignored-warnings-in-gcc'

Jose E. Marchesi says:

====================
bpf: avoid `attribute ignored' warnings in GCC

These two patches avoid warnings (turned into errors) when building
the BPF selftests with GCC.

[Changes from V1:
- As requested by reviewer, an additional patch has been added in
  order to remove __hidden from the `private' macro in
  cpumask_common.h.
- Typo bening -> benign fixed in the commit message of the second
  patch.]
====================

Link: https://lore.kernel.org/r/20240507074227.4523-1-jose.marchesi@oracle.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
17 months agobpf: Disable some `attribute ignored' warnings in GCC
Jose E. Marchesi [Tue, 7 May 2024 07:42:27 +0000 (09:42 +0200)]
bpf: Disable some `attribute ignored' warnings in GCC

This patch modifies selftests/bpf/Makefile to pass -Wno-attributes to
GCC.  This is because of the following attributes which are ignored:

- btf_decl_tag
- btf_type_tag

  There are many of these.  At the moment none of these are
  recognized/handled by gcc-bpf.

  We are aware that btf_decl_tag is necessary for some of the
  selftest harness to communicate test failure/success.  Support for
  it is in progress in GCC upstream:

  https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650482.html

  However, the GCC master branch is not yet open, so the series
  above (currently under review upstream) wont be able to make it
  there until 14.1 gets released, probably mid next week.

  As for btf_type_tag, more extensive work will be needed in GCC
  upstream to support it in both BTF and DWARF.  We have a WIP big
  patch for that, but that is not needed to compile/build the
  selftests.

- used

  There are SEC macros defined in the selftests as:

  #define SEC(N) __attribute__((section(N),used))

  The SEC macro is used for both functions and global variables.
  According to the GCC documentation `used' attribute is really only
  meaningful for functions, and it warns when the attribute is used
  for other global objects, like for example ctl_array in
  test_xdp_noinline.c.

  Ignoring this is benign.

- align_value

  In progs/test_cls_redirect.c:127 there is:

  typedef uint8_t *net_ptr __attribute__((align_value(8)));

  GCC warns that it is ignoring this attribute, because it is not
  implemented by GCC.

  I think ignoring this attribute in GCC is benign, because according
  to the clang documentation [1] its purpose seems to be merely
  declarative and doesn't seem to translate into extra checks at
  run-time, only to perhaps better optimized code ("runtime behavior
  is undefined if the pointed memory object is not aligned to the
  specified alignment").

  [1] https://clang.llvm.org/docs/AttributeReference.html#align-value

Tested in bpf-next master.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240507074227.4523-3-jose.marchesi@oracle.com
17 months agobpf: Avoid __hidden__ attribute in static object
Jose E. Marchesi [Tue, 7 May 2024 07:42:26 +0000 (09:42 +0200)]
bpf: Avoid __hidden__ attribute in static object

An object defined as `static' defaults to hidden visibility.  If
additionally the visibility(__weak__) compiler attribute is applied to
the declaration of the object, GCC warns that the attribute gets
ignored.

This patch removes the only instance of this problem among the BPF
selftests.

Tested in bpf-next master.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240507074227.4523-2-jose.marchesi@oracle.com
17 months agobpf: Remove redundant page mask of vmf->address
Haiyue Wang [Tue, 7 May 2024 06:33:39 +0000 (14:33 +0800)]
bpf: Remove redundant page mask of vmf->address

As the comment described in "struct vm_fault":
".address"      : 'Faulting virtual address - masked'
".real_address" : 'Faulting virtual address - unmasked'

The link [1] said: "Whatever the routes, all architectures end up to the
invocation of handle_mm_fault() which, in turn, (likely) ends up calling
__handle_mm_fault() to carry out the actual work of allocating the page
tables."

  __handle_mm_fault() does address assignment:
.address = address & PAGE_MASK,
.real_address = address,

This is debug dump by running `./test_progs -a "*arena*"`:

[   69.767494] arena fault: vmf->address = 10000001d000, vmf->real_address = 10000001d008
[   69.767496] arena fault: vmf->address = 10000001c000, vmf->real_address = 10000001c008
[   69.767499] arena fault: vmf->address = 10000001b000, vmf->real_address = 10000001b008
[   69.767501] arena fault: vmf->address = 10000001a000, vmf->real_address = 10000001a008
[   69.767504] arena fault: vmf->address = 100000019000, vmf->real_address = 100000019008
[   69.769388] arena fault: vmf->address = 10000001e000, vmf->real_address = 10000001e1e8

So we can use the value of 'vmf->address' to do BPF arena kernel address
space cast directly.

[1] https://docs.kernel.org/mm/page_tables.html

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Link: https://lore.kernel.org/r/20240507063358.8048-1-haiyue.wang@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoMerge branch 'bpf-verifier-range-computation-improvements'
Alexei Starovoitov [Tue, 7 May 2024 00:09:12 +0000 (17:09 -0700)]
Merge branch 'bpf-verifier-range-computation-improvements'

Cupertino Miranda says:

====================
bpf/verifier: range computation improvements

Hi everyone,

This is what I hope to be the last version. :)

Regards,
Cupertino

Changes from v1:
 - Reordered patches in the series.
 - Fix refactor to be acurate with original code.
 - Fixed other mentioned small problems.

Changes from v2:
 - Added a patch to replace mark_reg_unknowon for __mark_reg_unknown in
   the context of range computation.
 - Reverted implementation of refactor to v1 which used a simpler
   boolean return value in check function.
 - Further relaxed MUL to allow it to still compute a range when neither
   of its registers is a known value.
 - Simplified tests based on Eduards example.
 - Added messages in selftest commits.

Changes from v3:
 - Improved commit message of patch nr 1.
 - Coding style fixes.
 - Improve XOR and OR tests.
 - Made function calls to pass struct bpf_reg_state pointer instead.
 - Improved final code as a last patch.

Changes from v4:
 - Merged patch nr 7 in 2.

====================

Link: https://lore.kernel.org/r/20240506141849.185293-1-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: MUL range computation tests.
Cupertino Miranda [Mon, 6 May 2024 14:18:49 +0000 (15:18 +0100)]
selftests/bpf: MUL range computation tests.

Added a test for bound computation in MUL when non constant
values are used and both registers have bounded ranges.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
Link: https://lore.kernel.org/r/20240506141849.185293-7-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf/verifier: relax MUL range computation check
Cupertino Miranda [Mon, 6 May 2024 14:18:48 +0000 (15:18 +0100)]
bpf/verifier: relax MUL range computation check

MUL instruction required that src_reg would be a known value (i.e.
src_reg would be a const value). The condition in this case can be
relaxed, since the range computation algorithm used in current code
already supports a proper range computation for any valid range value on
its operands.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
Link: https://lore.kernel.org/r/20240506141849.185293-6-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agoselftests/bpf: XOR and OR range computation tests.
Cupertino Miranda [Mon, 6 May 2024 14:18:47 +0000 (15:18 +0100)]
selftests/bpf: XOR and OR range computation tests.

Added a test for bound computation in XOR and OR when non constant
values are used and both registers have bounded ranges.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/r/20240506141849.185293-5-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf/verifier: improve XOR and OR range computation
Cupertino Miranda [Mon, 6 May 2024 14:18:46 +0000 (15:18 +0100)]
bpf/verifier: improve XOR and OR range computation

Range for XOR and OR operators would not be attempted unless src_reg
would resolve to a single value, i.e. a known constant value.
This condition is unnecessary, and the following XOR/OR operator
handling could compute a possible better range.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/r/20240506141849.185293-4-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf/verifier: refactor checks for range computation
Cupertino Miranda [Mon, 6 May 2024 14:18:45 +0000 (15:18 +0100)]
bpf/verifier: refactor checks for range computation

Split range computation checks in its own function, isolating pessimitic
range set for dst_reg and failing return to a single point.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
bpf/verifier: improve code after range computation recent changes.
Link: https://lore.kernel.org/r/20240506141849.185293-3-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpf/verifier: replace calls to mark_reg_unknown.
Cupertino Miranda [Mon, 6 May 2024 14:18:44 +0000 (15:18 +0100)]
bpf/verifier: replace calls to mark_reg_unknown.

In order to further simplify the code in adjust_scalar_min_max_vals all
the calls to mark_reg_unknown are replaced by __mark_reg_unknown.

static void mark_reg_unknown(struct bpf_verifier_env *env,
        struct bpf_reg_state *regs, u32 regno)
{
if (WARN_ON(regno >= MAX_BPF_REG)) {
... mark all regs not init ...
return;
    }
__mark_reg_unknown(env, regs + regno);
}

The 'regno >= MAX_BPF_REG' does not apply to
adjust_scalar_min_max_vals(), because it is only called from the
following stack:
  - check_alu_op
    - adjust_reg_min_max_vals
      - adjust_scalar_min_max_vals

The check_alu_op() does check_reg_arg() which verifies that both src and
dst register numbers are within bounds.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/r/20240506141849.185293-2-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
17 months agobpftool, selftests/hid/bpf: Fix 29 clang warnings
John Hubbard [Sun, 5 May 2024 23:00:54 +0000 (16:00 -0700)]
bpftool, selftests/hid/bpf: Fix 29 clang warnings

When building either tools/bpf/bpftool, or tools/testing/selftests/hid,
(the same Makefile is used for these), clang generates many instances of
the following:

    "clang: warning: -lLLVM-17: 'linker' input unused"

Quentin points out that the LLVM version is only required in $(LIBS),
not in $(CFLAGS), so the fix is to remove it from CFLAGS.

Suggested-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240505230054.13813-1-jhubbard@nvidia.com
17 months agoselftests/bpf: Fix pointer arithmetic in test_xdp_do_redirect
Michal Schmidt [Mon, 6 May 2024 14:50:22 +0000 (16:50 +0200)]
selftests/bpf: Fix pointer arithmetic in test_xdp_do_redirect

Cast operation has a higher precedence than addition. The code here
wants to zero the 2nd half of the 64-bit metadata, but due to a pointer
arithmetic mistake, it writes the zero at offset 16 instead.

Just adding parentheses around "data + 4" would fix this, but I think
this will be slightly better readable with array syntax.

I was unable to test this with tools/testing/selftests/bpf/vmtest.sh,
because my glibc is newer than glibc in the provided VM image.
So I just checked the difference in the compiled code.
objdump -S tools/testing/selftests/bpf/xdp_do_redirect.test.o:
  - *((__u32 *)data) = 0x42; /* metadata test value */
  + ((__u32 *)data)[0] = 0x42; /* metadata test value */
        be7: 48 8d 85 30 fc ff ff  lea    -0x3d0(%rbp),%rax
        bee: c7 00 42 00 00 00     movl   $0x42,(%rax)
  - *((__u32 *)data + 4) = 0;
  + ((__u32 *)data)[1] = 0;
        bf4: 48 8d 85 30 fc ff ff  lea    -0x3d0(%rbp),%rax
  -     bfb: 48 83 c0 10           add    $0x10,%rax
  +     bfb: 48 83 c0 04           add    $0x4,%rax
        bff: c7 00 00 00 00 00     movl   $0x0,(%rax)

Fixes: 5640b6d89434 ("selftests/bpf: fix "metadata marker" getting overwritten by the netstack")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20240506145023.214248-1-mschmidt@redhat.com
17 months agoselftests/bpf: Use bpf_tracing.h instead of bpf_tcp_helpers.h
Martin KaFai Lau [Sat, 4 May 2024 00:50:45 +0000 (17:50 -0700)]
selftests/bpf: Use bpf_tracing.h instead of bpf_tcp_helpers.h

The bpf programs that this patch changes require the BPF_PROG macro.
The BPF_PROG macro is defined in the libbpf's bpf_tracing.h.
Some tests include bpf_tcp_helpers.h which includes bpf_tracing.h.
They don't need other things from bpf_tcp_helpers.h other than
bpf_tracing.h. This patch simplifies it by directly including
the bpf_tracing.h.

The motivation of this unnecessary code churn is to retire
the bpf_tcp_helpers.h by directly using vmlinux.h. Right now,
the main usage of the bpf_tcp_helpers.h is the partial kernel
socket definitions (e.g. socket, sock, tcp_sock). While the test
cases continue to grow, fields are kept adding to those partial
socket definitions (e.g. the recent bpf_cc_cubic.c test which
tried to extend bpf_tcp_helpers.c but eventually used the
vmlinux.h instead).

The idea is to retire bpf_tcp_helpers.c and consistently use
vmlinux.h for the tests that require the kernel sockets. This
patch tackles the obvious tests that can directly use bpf_tracing.h
instead of bpf_tcp_helpers.h.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240504005045.848376-1-martin.lau@linux.dev
17 months agolibbpf: Avoid casts from pointers to enums in bpf_tracing.h
Jose E. Marchesi [Thu, 2 May 2024 17:09:25 +0000 (19:09 +0200)]
libbpf: Avoid casts from pointers to enums in bpf_tracing.h

[Differences from V1:
  - Do not introduce a global typedef, as this is a public header.
  - Keep the void* casts in BPF_KPROBE_READ_RET_IP and
    BPF_KRETPROBE_READ_RET_IP, as these are necessary
    for converting to a const void* argument of
    bpf_probe_read_kernel.]

The BPF_PROG, BPF_KPROBE and BPF_KSYSCALL macros defined in
tools/lib/bpf/bpf_tracing.h use a clever hack in order to provide a
convenient way to define entry points for BPF programs as if they were
normal C functions that get typed actual arguments, instead of as
elements in a single "context" array argument.

For example, PPF_PROGS allows writing:

  SEC("struct_ops/cwnd_event")
  void BPF_PROG(cwnd_event, struct sock *sk, enum tcp_ca_event event)
  {
        bbr_cwnd_event(sk, event);
        dctcp_cwnd_event(sk, event);
        cubictcp_cwnd_event(sk, event);
  }

That expands into a pair of functions:

  void ____cwnd_event (unsigned long long *ctx, struct sock *sk, enum tcp_ca_event event)
  {
        bbr_cwnd_event(sk, event);
        dctcp_cwnd_event(sk, event);
        cubictcp_cwnd_event(sk, event);
  }

  void cwnd_event (unsigned long long *ctx)
  {
        _Pragma("GCC diagnostic push")
        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")
        return ____cwnd_event(ctx, (void*)ctx[0], (void*)ctx[1]);
        _Pragma("GCC diagnostic pop")
  }

Note how the 64-bit unsigned integers in the incoming CTX get casted
to a void pointer, and then implicitly converted to whatever type of
the actual argument in the wrapped function.  In this case:

  Arg1: unsigned long long -> void * -> struct sock *
  Arg2: unsigned long long -> void * -> enum tcp_ca_event

The behavior of GCC and clang when facing such conversions differ:

  pointer -> pointer

    Allowed by the C standard.
    GCC: no warning nor error.
    clang: no warning nor error.

  pointer -> integer type

    [C standard says the result of this conversion is implementation
     defined, and it may lead to unaligned pointer etc.]

    GCC: error: integer from pointer without a cast [-Wint-conversion]
    clang: error: incompatible pointer to integer conversion [-Wint-conversion]

  pointer -> enumerated type

    GCC: error: incompatible types in assigment (*)
    clang: error: incompatible pointer to integer conversion [-Wint-conversion]

These macros work because converting pointers to pointers is allowed,
and converting pointers to integers also works provided a suitable
integer type even if it is implementation defined, much like casting a
pointer to uintptr_t is guaranteed to work by the C standard.  The
conversion errors emitted by both compilers by default are silenced by
the pragmas.

However, the GCC error marked with (*) above when assigning a pointer
to an enumerated value is not associated with the -Wint-conversion
warning, and it is not possible to turn it off.

This is preventing building the BPF kernel selftests with GCC.

This patch fixes this by avoiding intermediate casts to void*,
replaced with casts to `unsigned long long', which is an integer type
capable of safely store a BPF pointer, much like the standard
uintptr_t.

Testing performed in bpf-next master:
  - vmtest.sh -- ./test_verifier
  - vmtest.sh -- ./test_progs
  - make M=samples/bpf
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240502170925.3194-1-jose.marchesi@oracle.com
17 months agolibbpf: Fix bpf_ksym_exists() in GCC
Jose E. Marchesi [Sun, 28 Apr 2024 11:25:59 +0000 (13:25 +0200)]
libbpf: Fix bpf_ksym_exists() in GCC

The macro bpf_ksym_exists is defined in bpf_helpers.h as:

  #define bpf_ksym_exists(sym) ({ \
   _Static_assert(!__builtin_constant_p(!!sym), #sym " should be marked as __weak"); \
   !!sym; \
  })

The purpose of the macro is to determine whether a given symbol has
been defined, given the address of the object associated with the
symbol.  It also has a compile-time check to make sure the object
whose address is passed to the macro has been declared as weak, which
makes the check on `sym' meaningful.

As it happens, the check for weak doesn't work in GCC in all cases,
because __builtin_constant_p not always folds at parse time when
optimizing.  This is because optimizations that happen later in the
compilation process, like inlining, may make a previously non-constant
expression a constant.  This results in errors like the following when
building the selftests with GCC:

  bpf_helpers.h:190:24: error: expression in static assertion is not constant
  190 |         _Static_assert(!__builtin_constant_p(!!sym), #sym " should be marked as __weak");       \
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fortunately recent versions of GCC support a __builtin_has_attribute
that can be used to directly check for the __weak__ attribute.  This
patch changes bpf_helpers.h to use that builtin when building with a
recent enough GCC, and to omit the check if GCC is too old to support
the builtin.

The macro used for GCC becomes:

  #define bpf_ksym_exists(sym) ({ \
_Static_assert(__builtin_has_attribute (*sym, __weak__), #sym " should be marked as __weak"); \
!!sym; \
  })

Note that since bpf_ksym_exists is designed to get the address of the
object associated with symbol SYM, we pass *sym to
__builtin_has_attribute instead of sym.  When an expression is passed
to __builtin_has_attribute then it is the type of the passed
expression that is checked for the specified attribute.  The
expression itself is not evaluated.  This accommodates well with the
existing usages of the macro:

- For function objects:

  struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym __weak;
  [...]
  bpf_ksym_exists(bpf_task_acquire)

- For variable objects:

  extern const struct rq runqueues __ksym __weak; /* typed */
  [...]
  bpf_ksym_exists(&runqueues)

Note also that BPF support was added in GCC 10 and support for
__builtin_has_attribute in GCC 9.

Locally tested in bpf-next master branch.
No regressions.

Signed-of-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240428112559.10518-1-jose.marchesi@oracle.com
17 months agolibbpf: fix ring_buffer__consume_n() return result logic
Andrii Nakryiko [Tue, 30 Apr 2024 20:19:52 +0000 (13:19 -0700)]
libbpf: fix ring_buffer__consume_n() return result logic

Add INT_MAX check to ring_buffer__consume_n(). We do the similar check
to handle int return result of all these ring buffer APIs in other APIs
and ring_buffer__consume_n() is missing one. This patch fixes this
omission.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240430201952.888293-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agolibbpf: fix potential overflow in ring__consume_n()
Andrii Nakryiko [Tue, 30 Apr 2024 20:19:51 +0000 (13:19 -0700)]
libbpf: fix potential overflow in ring__consume_n()

ringbuf_process_ring() return int64_t, while ring__consume_n() assigns
it to int. It's highly unlikely, but possible for ringbuf_process_ring()
to return value larger than INT_MAX, so use int64_t. ring__consume_n()
does check INT_MAX before returning int result to the user.

Fixes: 4d22ea94ea33 ("libbpf: Add ring__consume_n / ring_buffer__consume_n")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240430201952.888293-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoMerge branch 'Add new args into tcp_congestion_ops' cong_control'
Martin KaFai Lau [Thu, 2 May 2024 22:39:50 +0000 (15:39 -0700)]
Merge branch 'Add new args into tcp_congestion_ops' cong_control'

Miao Xu says:

====================
This patchset attempts to add two new arguments into the hookpoint
cong_control in tcp_congestion_ops. The new arguments are inherited
from the caller tcp_cong_control and can be used by any bpf cc prog
that implements its own logic inside this hookpoint.

Please review. Thanks a lot!

Changelog
=====
v2->v3:
  - Fixed the broken selftest caused by the new arguments.
  - Renamed the selftest file name and bpf prog name.

v1->v2:
  - Split the patchset into 3 separate patches.
  - Added highlights in the selftest prog.
  - Removed the dependency on bpf_tcp_helpers.h.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Add test for the use of new args in cong_control
Miao Xu [Thu, 2 May 2024 04:23:18 +0000 (21:23 -0700)]
selftests/bpf: Add test for the use of new args in cong_control

This patch adds a selftest to show the usage of the new arguments in
cong_control. For simplicity's sake, the testing example reuses cubic's
kernel functions.

Signed-off-by: Miao Xu <miaxu@meta.com>
Link: https://lore.kernel.org/r/20240502042318.801932-4-miaxu@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agobpf: tcp: Allow to write tp->snd_cwnd_stamp in bpf_tcp_ca
Miao Xu [Thu, 2 May 2024 04:23:17 +0000 (21:23 -0700)]
bpf: tcp: Allow to write tp->snd_cwnd_stamp in bpf_tcp_ca

This patch allows the write of tp->snd_cwnd_stamp in a bpf tcp
ca program. An use case of writing this field is to keep track
of the time whenever tp->snd_cwnd is raised or reduced inside
the `cong_control` callback.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Miao Xu <miaxu@meta.com>
Link: https://lore.kernel.org/r/20240502042318.801932-3-miaxu@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agotcp: Add new args for cong_control in tcp_congestion_ops
Miao Xu [Thu, 2 May 2024 04:23:16 +0000 (21:23 -0700)]
tcp: Add new args for cong_control in tcp_congestion_ops

This patch adds two new arguments for cong_control of struct
tcp_congestion_ops:
 - ack
 - flag
These two arguments are inherited from the caller tcp_cong_control in
tcp_intput.c. One use case of them is to update cwnd and pacing rate
inside cong_control based on the info they provide. For example, the
flag can be used to decide if it is the right time to raise or reduce a
sender's cwnd.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Miao Xu <miaxu@meta.com>
Link: https://lore.kernel.org/r/20240502042318.801932-2-miaxu@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoMerge branch 'selftests/bpf: Add sockaddr tests for kernel networking'
Martin KaFai Lau [Thu, 2 May 2024 19:09:23 +0000 (12:09 -0700)]
Merge branch 'selftests/bpf: Add sockaddr tests for kernel networking'

Jordan Rife says:

====================
This patch series adds test coverage for BPF sockaddr hooks and their
interactions with kernel socket functions (i.e. kernel_bind(),
kernel_connect(), kernel_sendmsg(), sock_sendmsg(),
kernel_getpeername(), and kernel_getsockname()) while also rounding out
IPv4 and IPv6 sockaddr hook coverage in prog_tests/sock_addr.c.

As with v1 of this patch series, we add regression coverage for the
issues addressed by these patches,

- commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect")
- commit 86a7e0b69bd5("net: prevent rewrite of msg_name in sock_sendmsg()")
- commit c889a99a21bf("net: prevent address rewrite in kernel_bind()")
- commit 01b2885d9415("net: Save and restore msg_namelen in sock_sendmsg")

but broaden the focus a bit.

In order to extend prog_tests/sock_addr.c to test these kernel
functions, we add a set of new kfuncs that wrap individual socket
operations to bpf_testmod and invoke them through set of corresponding
SYSCALL programs (progs/sock_addr_kern.c). Each test case can be
configured to use a different set of "sock_ops" depending on whether it
is testing kernel calls (kernel_bind(), kernel_connect(), etc.) or
system calls (bind(), connect(), etc.).

=======
Patches
=======
* Patch 1 fixes the sock_addr bind test program to work for big endian
  architectures such as s390x.
* Patch 2 introduces the new kfuncs to bpf_testmod.
* Patch 3 introduces the BPF program which allows us to invoke these
  kfuncs invividually from the test program.
* Patch 4 lays the groundwork for IPv4 and IPv6 sockaddr hook coverage
  by migrating much of the environment setup logic from
  bpf/test_sock_addr.sh into prog_tests/sock_addr.c and moves test cases
  to cover bind4/6, connect4/6, sendmsg4/6 and recvmsg4/6 hooks.
* Patch 5 makes the set of socket operations for each test case
  configurable, laying the groundwork for Patch 6.
* Patch 6 introduces two sets of sock_ops that invoke the kernel
  equivalents of connect(), bind(), etc. and uses these to add coverage
  for the kernel socket functions.

=======
Changes
=======
v2->v3
------
* Renamed bind helpers. Dropped "_ntoh" suffix.
* Added guards to kfuncs to make sure addrlen and msglen do not exceed
  the buffer capacity.
* Added KF_SLEEPABLE flag to kfuncs.
* Added a mutex (sock_lock) to kfuncs to serialize access to sock.
* Added NULL check for sock to each kfunc.
* Use the "sock_addr" networking namespace for all network interface
  setup and testing.
* Use "nodad" when calling "ip -6 addr add" during interface setup to
  avoid delays and remove ping loop.
* Removed test cases from test_sock_addr.c to make it clear what remains
  to be migrated.
* Removed unused parameter (expect_change) from sock_addr_op().

Link: https://lore.kernel.org/bpf/20240412165230.2009746-1-jrife@google.com/T/#u
v1->v2
------
* Dropped test_progs/sock_addr_kern.c and the sock_addr_kern test module
  in favor of simply expanding bpf_testmod and test_progs/sock_addr.c.
* Migrated environment setup logic from bpf/test_sock_addr.sh into
  prog_tests/sock_addr.c rather than invoking the script from the test
  program.
* Added kfuncs to bpf_testmod as well as the sock_addr_kern BPF program
  to enable us to invoke kernel socket functions from
  test_progs/sock_addr.c.
* Added test coverage for kernel socket functions to
  test_progs/sock_addr.c.

Link: https://lore.kernel.org/bpf/20240329191907.1808635-1-jrife@google.com/T/#u
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Add kernel socket operation tests
Jordan Rife [Mon, 29 Apr 2024 21:45:23 +0000 (16:45 -0500)]
selftests/bpf: Add kernel socket operation tests

This patch creates two sets of sock_ops that call out to the SYSCALL
hooks in the sock_addr_kern BPF program and uses them to construct
test cases for the range of supported operations (kernel_connect(),
kernel_bind(), kernel_sendms(), sock_sendmsg(), kernel_getsockname(),
kenel_getpeername()). This ensures that these interact with BPF sockaddr
hooks as intended.

Beyond this it also ensures that these operations do not modify their
address parameter, providing regression coverage for the issues
addressed by this set of patches:

- commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect")
- commit 86a7e0b69bd5("net: prevent rewrite of msg_name in sock_sendmsg()")
- commit c889a99a21bf("net: prevent address rewrite in kernel_bind()")
- commit 01b2885d9415("net: Save and restore msg_namelen in sock_sendmsg")

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240429214529.2644801-7-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Make sock configurable for each test case
Jordan Rife [Mon, 29 Apr 2024 21:45:22 +0000 (16:45 -0500)]
selftests/bpf: Make sock configurable for each test case

In order to reuse the same test code for both socket system calls (e.g.
connect(), bind(), etc.) and kernel socket functions (e.g.
kernel_connect(), kernel_bind(), etc.), this patch introduces the "ops"
field to sock_addr_test. This field allows each test cases to configure
the set of functions used in the test case to create, manipulate, and
tear down a socket.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240429214529.2644801-6-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Move IPv4 and IPv6 sockaddr test cases
Jordan Rife [Mon, 29 Apr 2024 21:45:21 +0000 (16:45 -0500)]
selftests/bpf: Move IPv4 and IPv6 sockaddr test cases

This patch lays the groundwork for testing IPv4 and IPv6 sockaddr hooks
and their interaction with both socket syscalls and kernel functions
(e.g. kernel_connect, kernel_bind, etc.). It moves some of the test
cases from the old-style bpf/test_sock_addr.c self test into the
sock_addr prog_test in a step towards fully retiring
bpf/test_sock_addr.c. We will expand the test dimensions in the
sock_addr prog_test in a later patch series in order to migrate the
remaining test cases.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240429214529.2644801-5-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Implement BPF programs for kernel socket operations
Jordan Rife [Mon, 29 Apr 2024 21:45:20 +0000 (16:45 -0500)]
selftests/bpf: Implement BPF programs for kernel socket operations

This patch lays out a set of SYSCALL programs that can be used to invoke
the socket operation kfuncs in bpf_testmod, allowing a test program to
manipulate kernel socket operations from userspace.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240429214529.2644801-4-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Implement socket kfuncs for bpf_testmod
Jordan Rife [Mon, 29 Apr 2024 21:45:19 +0000 (16:45 -0500)]
selftests/bpf: Implement socket kfuncs for bpf_testmod

This patch adds a set of kfuncs to bpf_testmod that can be used to
manipulate a socket from kernel space.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240429214529.2644801-3-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Fix bind program for big endian systems
Jordan Rife [Mon, 29 Apr 2024 21:45:18 +0000 (16:45 -0500)]
selftests/bpf: Fix bind program for big endian systems

Without this fix, the bind4 and bind6 programs will reject bind attempts
on big endian systems. This patch ensures that CI tests pass for the
s390x architecture.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240429214529.2644801-2-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agobpf: Missing trailing slash in tools/testing/selftests/bpf/Makefile
Jose E. Marchesi [Thu, 2 May 2024 14:08:31 +0000 (16:08 +0200)]
bpf: Missing trailing slash in tools/testing/selftests/bpf/Makefile

tools/lib/bpf/Makefile assumes that the patch in OUTPUT is a directory
and that it includes a trailing slash.  This seems to be a common
expectation for OUTPUT among all the Makefiles.

In the rule for runqslower in tools/testing/selftests/bpf/Makefile the
variable BPFTOOL_OUTPUT is set to a directory name that lacks a
trailing slash.  This results in a malformed BPF_HELPER_DEFS being
defined in lib/bpf/Makefile.

This problem becomes evident when a file like
tools/lib/bpf/bpf_tracing.h gets updated.

This patch fixes the problem by adding the missing slash in the value
for BPFTOOL_OUTPUT in the $(OUTPUT)/runqslower rule.

Regtested by running selftests in bpf-next master and building
samples/bpf programs.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240502140831.23915-1-jose.marchesi@oracle.com
17 months agolibbpf: Fix error message in attach_kprobe_multi
Jiri Olsa [Thu, 2 May 2024 07:55:41 +0000 (09:55 +0200)]
libbpf: Fix error message in attach_kprobe_multi

We just failed to retrieve pattern, so we need to print spec instead.

Fixes: ddc6b04989eb ("libbpf: Add bpf_program__attach_kprobe_multi_opts function")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240502075541.1425761-2-jolsa@kernel.org
17 months agolibbpf: Fix error message in attach_kprobe_session
Jiri Olsa [Thu, 2 May 2024 07:55:40 +0000 (09:55 +0200)]
libbpf: Fix error message in attach_kprobe_session

We just failed to retrieve pattern, so we need to print spec instead.

Fixes: 2ca178f02b2f ("libbpf: Add support for kprobe session attach")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240502075541.1425761-1-jolsa@kernel.org
17 months agobpf: crypto: fix build when CONFIG_CRYPTO=m
Vadim Fedorenko [Wed, 1 May 2024 17:01:30 +0000 (10:01 -0700)]
bpf: crypto: fix build when CONFIG_CRYPTO=m

Crypto subsytem can be build as a module. In this case we still have to
build BPF crypto framework otherwise the build will fail.

Fixes: 3e1c6f35409f ("bpf: make common crypto API for TC/XDP programs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405011634.4JK40epY-lkp@intel.com/
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://lore.kernel.org/r/20240501170130.1682309-1-vadfed@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agolibbpf: better fix for handling nulled-out struct_ops program
Andrii Nakryiko [Wed, 1 May 2024 04:17:06 +0000 (21:17 -0700)]
libbpf: better fix for handling nulled-out struct_ops program

Previous attempt to fix the handling of nulled-out (from skeleton)
struct_ops program is working well only if struct_ops program is defined
as non-autoloaded by default (i.e., has SEC("?struct_ops") annotation,
with question mark).

Unfortunately, that fix is incomplete due to how
bpf_object_adjust_struct_ops_autoload() is marking referenced or
non-referenced struct_ops program as autoloaded (or not). Because
bpf_object_adjust_struct_ops_autoload() is run after
bpf_map__init_kern_struct_ops() step, which sets program slot to NULL,
such programs won't be considered "referenced", and so its autoload
property won't be changed.

This all sounds convoluted and it is, but the desire is to have as
natural behavior (as far as struct_ops usage is concerned) as possible.

This fix is redoing the original fix but makes it work for
autoloaded-by-default struct_ops programs as well. We achieve this by
forcing prog->autoload to false if prog was declaratively set for some
struct_ops map, but then nulled-out from skeleton (programmatically).
This achieves desired effect of not autoloading it. If such program is
still referenced somewhere else (different struct_ops map or different
callback field), it will get its autoload property adjusted by
bpf_object_adjust_struct_ops_autoload() later.

We also fix selftest, which accidentally used SEC("?struct_ops")
annotation. It was meant to use autoload-by-default program from the
very beginning.

Fixes: f973fccd43d3 ("libbpf: handle nulled-out program in struct_ops correctly")
Cc: Kui-Feng Lee <thinker.li@gmail.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240501041706.3712608-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoMerge branch 'libbpf-support-module-function-syntax-for-tracing-programs'
Andrii Nakryiko [Wed, 1 May 2024 16:53:48 +0000 (09:53 -0700)]
Merge branch 'libbpf-support-module-function-syntax-for-tracing-programs'

Viktor Malik says:

====================
libbpf: support "module:function" syntax for tracing programs

In some situations, it is useful to explicitly specify a kernel module
to search for a tracing program target (e.g. when a function of the same
name exists in multiple modules or in vmlinux).

This change enables that by allowing the "module:function" syntax for
the find_kernel_btf_id function. Thanks to this, the syntax can be used
both from a SEC macro (i.e. `SEC(fentry/module:function)`) and via the
bpf_program__set_attach_target API call.
---

Changes in v2:
- stylistic changes (suggested by Andrii)
- added Andrii's ack to the second patch
====================

Link: https://lore.kernel.org/r/cover.1714469650.git.vmalik@redhat.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
17 months agoselftests/bpf: add tests for the "module: Function" syntax
Viktor Malik [Tue, 30 Apr 2024 09:38:07 +0000 (11:38 +0200)]
selftests/bpf: add tests for the "module: Function" syntax

The previous patch added support for the "module:function" syntax for
tracing programs. This adds tests for explicitly specifying the module
name via the SEC macro and via the bpf_program__set_attach_target call.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/8a076168ed847f7c8a6c25715737b1fea84e38be.1714469650.git.vmalik@redhat.com
17 months agolibbpf: support "module: Function" syntax for tracing programs
Viktor Malik [Tue, 30 Apr 2024 09:38:06 +0000 (11:38 +0200)]
libbpf: support "module: Function" syntax for tracing programs

In some situations, it is useful to explicitly specify a kernel module
to search for a tracing program target (e.g. when a function of the same
name exists in multiple modules or in vmlinux).

This patch enables that by allowing the "module:function" syntax for the
find_kernel_btf_id function. Thanks to this, the syntax can be used both
from a SEC macro (i.e. `SEC(fentry/module:function)`) and via the
bpf_program__set_attach_target API call.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/9085a8cb9a552de98e554deb22ff7e977d025440.1714469650.git.vmalik@redhat.com
17 months agoMerge branch 'use network helpers, part 3'
Martin KaFai Lau [Tue, 30 Apr 2024 18:26:24 +0000 (11:26 -0700)]
Merge branch 'use network helpers, part 3'

Geliang Tang says:

====================
This patchset adds opts argument for __start_server.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Drop start_server_proto helper
Geliang Tang [Thu, 25 Apr 2024 03:23:43 +0000 (11:23 +0800)]
selftests/bpf: Drop start_server_proto helper

Protocol can be set by __start_server() helper directly now, this makes
the heler start_server_proto() useless.

This patch drops it, and implenments start_server() using make_sockaddr()
and __start_server().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/55d8a04e0bb8240a5fda2da3e9bdffe6fc8547b2.1714014697.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Make start_mptcp_server static
Geliang Tang [Thu, 25 Apr 2024 03:23:42 +0000 (11:23 +0800)]
selftests/bpf: Make start_mptcp_server static

start_mptcp_server() shouldn't be a public helper, it only be used in
MPTCP tests. This patch moves it into prog_tests/mptcp.c, and implenments
it using make_sockaddr() and start_server_addr() instead of using
start_server_proto().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/50ec7049e280c60a2924937940851f8fee2b73b8.1714014697.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
17 months agoselftests/bpf: Add opts argument for __start_server
Geliang Tang [Thu, 25 Apr 2024 03:23:41 +0000 (11:23 +0800)]
selftests/bpf: Add opts argument for __start_server

This patch adds network_helper_opts parameter for __start_server()
instead of "int protocol" and "int timeout_ms". This not only reduces
the number of parameters, but also makes it more flexible.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/127d2f0929980b41f757dcfebe1b667e6bfb43f1.1714014697.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>