linux.git
20 months agobpf: btf: Add BTF_KFUNCS_START/END macro pair
Daniel Xu [Mon, 29 Jan 2024 01:24:07 +0000 (18:24 -0700)]
bpf: btf: Add BTF_KFUNCS_START/END macro pair

This macro pair is functionally equivalent to BTF_SET8_START/END, except
with BTF_SET8_KFUNCS flag set in the btf_id_set8 flags field. The next
commit will codemod all kfunc set8s to this new variant such that all
kfuncs are tagged as such in .BTF_ids section.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/d536c57c7c2af428686853cc7396b7a44faa53b7.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agobpf: btf: Support flags for BTF_SET8 sets
Daniel Xu [Mon, 29 Jan 2024 01:24:06 +0000 (18:24 -0700)]
bpf: btf: Support flags for BTF_SET8 sets

This commit adds support for flags on BTF_SET8s. struct btf_id_set8
already supported 32 bits worth of flags, but was only used for
alignment purposes before.

We now use these bits to encode flags. The first use case is tagging
kfunc sets with a flag so that pahole can recognize which
BTF_ID_FLAGS(func, ..) are actual kfuncs.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/7bb152ec76d6c2c930daec88e995bf18484a5ebb.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agoselftests/bpf: Disable IPv6 for lwt_redirect test
Manu Bretelle [Wed, 31 Jan 2024 05:32:12 +0000 (21:32 -0800)]
selftests/bpf: Disable IPv6 for lwt_redirect test

After a recent change in the vmtest runner, this test started failing
sporadically.

Investigation showed that this test was subject to race condition which
got exacerbated after the vm runner change. The symptoms being that the
logic that waited for an ICMPv4 packet is naive and will break if 5 or
more non-ICMPv4 packets make it to tap0.
When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
router solicitation...
On a system with good performance, the expected ICMPv4 packet would very
likely make it to the network interface promptly, but on a system with
poor performance, those "guarantees" do not hold true anymore.

Given that the test is IPv4 only, this change disable IPv6 in the test
netns by setting `net.ipv6.conf.all.disable_ipv6` to 1.
This essentially leaves "ping" as the sole generator of traffic in the
network namespace.
If this test was to be made IPv6 compatible, the logic in
`wait_for_packet` would need to be modified.

In more details...

At a high level, the test does:
- create a new namespace
- in `setup_redirect_target` set up lo, tap0, and link_err interfaces as
  well as add 2 routes that attaches ingress/egress sections of
  `test_lwt_redirect.bpf.o` to the xmit path.
- in `send_and_capture_test_packets` send an ICMP packet and read off
  the tap interface (using `wait_for_packet`) to check that a ICMP packet
  with the right size is read.

`wait_for_packet` will try to read `max_retry` (5) times from the tap0
fd looking for an ICMPv4 packet matching some criteria.

The problem is that when we set up the `tap0` interface, because IPv6 is
enabled by default, traffic such as Router solicitation is sent through
tap0, as in:

  # tcpdump -r /tmp/lwt_redirect.pc
  reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet)
  04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32
  04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108
  04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
  04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16

If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in:

  2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec
  2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec
  2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec
  2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec
  2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec
  2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec
  2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec
  2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec
  2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec
  2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec
  2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec
  2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec
  2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec
  2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec
  2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec
  2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec
  2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec
  2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec
  2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec
  2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec
  2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1
  2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails
  2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec

When running in a VM which potential resource contrains, the odds that calling
`ping` is not scheduled very soon after bringing `tap0` up increases,
and with this the chances to get our ICMP packet pushed to position 6+
in the network trace.

To confirm this indeed solves the issue, I ran the test 100 times in a
row with:

  errors=0
  successes=0
  for i in `seq 1 100`
  do
    ./test_progs -t lwt_redirect/lwt_redirect_normal
    if [ $? -eq 0 ]; then
      successes=$((successes+1))
    else
      errors=$((errors+1))
    fi
  done
  echo "successes: $successes/errors: $errors"

While this test would at least fail a couple of time every 10 runs, here
it ran 100 times with no error.

Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240131053212.2247527-1-chantr4@gmail.com
20 months agoMerge branch 'libbpf: add bpf_core_cast() helper'
Martin KaFai Lau [Tue, 30 Jan 2024 23:55:50 +0000 (15:55 -0800)]
Merge branch 'libbpf: add bpf_core_cast() helper'

Andrii Nakryiko says:

====================
Add bpf_core_cast(<ptr>, <type>) macro wrapper around bpf_rdonly_cast() kfunc
to make it easier to use this functionality in BPF code. See patch #2 for
BPF selftests conversions demonstrating improvements in code succinctness.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
20 months agoselftests/bpf: convert bpf_rdonly_cast() uses to bpf_core_cast() macro
Andrii Nakryiko [Tue, 30 Jan 2024 21:20:23 +0000 (13:20 -0800)]
selftests/bpf: convert bpf_rdonly_cast() uses to bpf_core_cast() macro

Use more ergonomic bpf_core_cast() macro instead of bpf_rdonly_cast() in
selftests code.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-3-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
20 months agolibbpf: add bpf_core_cast() macro
Andrii Nakryiko [Tue, 30 Jan 2024 21:20:22 +0000 (13:20 -0800)]
libbpf: add bpf_core_cast() macro

Add bpf_core_cast() macro that wraps bpf_rdonly_cast() kfunc. It's more
ergonomic than kfunc, as it automatically extracts btf_id with
bpf_core_type_id_kernel(), and works with type names. It also casts result
to (T *) pointer. See the definition of the macro, it's self-explanatory.

libbpf declares bpf_rdonly_cast() extern as __weak __ksym and should be
safe to not conflict with other possible declarations in user code.

But we do have a conflict with current BPF selftests that declare their
externs with first argument as `void *obj`, while libbpf opts into more
permissive `const void *obj`. This causes conflict, so we fix up BPF
selftests uses in the same patch.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
20 months agoMerge branch 'trusted-ptr_to_btf_id-arg-support-in-global-subprogs'
Alexei Starovoitov [Tue, 30 Jan 2024 17:41:51 +0000 (09:41 -0800)]
Merge branch 'trusted-ptr_to_btf_id-arg-support-in-global-subprogs'

Andrii Nakryiko says:

====================
Trusted PTR_TO_BTF_ID arg support in global subprogs

This patch set follows recent changes that added btf_decl_tag-based argument
annotation support for global subprogs. This time we add ability to pass
PTR_TO_BTF_ID (BTF-aware kernel pointers) arguments into global subprograms.
We support explicitly trusted arguments only, for now.

Patch #1 adds logic for arg:trusted tag support on the verifier side. Default
semantic of such arguments is non-NULL, enforced on caller side. But patch #2
adds arg:nullable tag that can be combined with arg:trusted to make callee
explicitly do the NULL check, which helps implement "optional" PTR_TO_BTF_ID
arguments.

Patch #3 adds libbpf-side __arg_trusted and __arg_nullable macros.

Patch #4 adds a bunch of tests validating __arg_trusted in combination with
__arg_nullable.

v2->v3:
  - went back to arg:nullable and __arg_nullable naming;
  - rebased on latest bpf-next after prepartory patches landed;
v1->v2:
  - added fix up to type enforcement changes, landed earlier;
  - dropped bpf_core_cast() changes, will post them separately, as they now
    are not used in added tests;
  - dropped arg:untrusted support (Alexei);
  - renamed arg:nullable to arg:maybe_null (Alexei);
  - and also added task_struct___local flavor tests (Alexei).
====================

Link: https://lore.kernel.org/r/20240130000648.2144827-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agoselftests/bpf: add trusted global subprog arg tests
Andrii Nakryiko [Tue, 30 Jan 2024 00:06:48 +0000 (16:06 -0800)]
selftests/bpf: add trusted global subprog arg tests

Add a bunch of test cases validating behavior of __arg_trusted and its
combination with __arg_nullable tag. We also validate CO-RE flavor
support by kernel for __arg_trusted args.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agolibbpf: add __arg_trusted and __arg_nullable tag macros
Andrii Nakryiko [Tue, 30 Jan 2024 00:06:47 +0000 (16:06 -0800)]
libbpf: add __arg_trusted and __arg_nullable tag macros

Add __arg_trusted to annotate global func args that accept trusted
PTR_TO_BTF_ID arguments.

Also add __arg_nullable to combine with __arg_trusted (and maybe other
tags in the future) to force global subprog itself (i.e., callee) to do
NULL checks, as opposed to default non-NULL semantics (and thus caller's
responsibility to ensure non-NULL values).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agobpf: add arg:nullable tag to be combined with trusted pointers
Andrii Nakryiko [Tue, 30 Jan 2024 00:06:46 +0000 (16:06 -0800)]
bpf: add arg:nullable tag to be combined with trusted pointers

Add ability to mark arg:trusted arguments with optional arg:nullable
tag to mark it as PTR_TO_BTF_ID_OR_NULL variant, which will allow
callers to pass NULL, and subsequently will force global subprog's code
to do NULL check. This allows to have "optional" PTR_TO_BTF_ID values
passed into global subprogs.

For now arg:nullable cannot be combined with anything else.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agobpf: add __arg_trusted global func arg tag
Andrii Nakryiko [Tue, 30 Jan 2024 00:06:45 +0000 (16:06 -0800)]
bpf: add __arg_trusted global func arg tag

Add support for passing PTR_TO_BTF_ID registers to global subprogs.
Currently only PTR_TRUSTED flavor of PTR_TO_BTF_ID is supported.
Non-NULL semantics is assumed, so caller will be forced to prove
PTR_TO_BTF_ID can't be NULL.

Note, we disallow global subprogs to destroy passed in PTR_TO_BTF_ID
arguments, even the trusted one. We achieve that by not setting
ref_obj_id when validating subprog code. This basically enforces (in
Rust terms) borrowing semantics vs move semantics. Borrowing semantics
seems to be a better fit for isolated global subprog validation
approach.

Implementation-wise, we utilize existing logic for matching
user-provided BTF type to kernel-side BTF type, used by BPF CO-RE logic
and following same matching rules. We enforce a unique match for types.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agobpf: Move -Wno-compare-distinct-pointer-types to BPF_CFLAGS
Jose E. Marchesi [Tue, 30 Jan 2024 11:36:24 +0000 (12:36 +0100)]
bpf: Move -Wno-compare-distinct-pointer-types to BPF_CFLAGS

Clang supports enabling/disabling certain conversion diagnostics via
the -W[no-]compare-distinct-pointer-types command line options.
Disabling this warning is required by some BPF selftests due to
-Werror.  Until very recently GCC would emit these warnings
unconditionally, which was a problem for gcc-bpf, but we added support
for the command-line options to GCC upstream [1].

This patch moves the -Wno-cmopare-distinct-pointer-types from
CLANG_CFLAGS to BPF_CFLAGS in selftests/bpf/Makefile so the option
is also used in gcc-bpf builds, not just in clang builds.

Tested in bpf-next master.
No regressions.

  [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627769.html

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240130113624.24940-1-jose.marchesi@oracle.com
20 months agobpf: Build type-punning BPF selftests with -fno-strict-aliasing
Jose E. Marchesi [Tue, 30 Jan 2024 11:03:43 +0000 (12:03 +0100)]
bpf: Build type-punning BPF selftests with -fno-strict-aliasing

A few BPF selftests perform type punning and they may break strict
aliasing rules, which are exploited by both GCC and clang by default
while optimizing.  This can lead to broken compiled programs.

This patch disables strict aliasing for these particular tests, by
mean of the -fno-strict-aliasing command line option.  This will make
sure these tests are optimized properly even if some strict aliasing
rule gets violated.

After this patch, GCC is able to build all the selftests without
warning about potential strict aliasing issue.

bpf@vger discussion on strict aliasing and BPF selftests:
https://lore.kernel.org/bpf/bae1205a-b6e5-4e46-8e20-520d7c327f7a@linux.dev/T/#t

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/bae1205a-b6e5-4e46-8e20-520d7c327f7a@linux.dev
Link: https://lore.kernel.org/bpf/20240130110343.11217-1-jose.marchesi@oracle.com
20 months agobpf,token: Use BIT_ULL() to convert the bit mask
Haiyue Wang [Sat, 27 Jan 2024 13:48:56 +0000 (21:48 +0800)]
bpf,token: Use BIT_ULL() to convert the bit mask

Replace the '(1ULL << *)' with the macro BIT_ULL(nr).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127134901.3698613-1-haiyue.wang@intel.com
20 months agobpf: Generate const static pointers for kernel helpers
Jose E. Marchesi [Sat, 27 Jan 2024 18:50:31 +0000 (19:50 +0100)]
bpf: Generate const static pointers for kernel helpers

The generated bpf_helper_defs.h file currently contains definitions
like this for the kernel helpers, which are static objects:

  static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1;

These work well in both clang and GCC because both compilers do
constant propagation with -O1 and higher optimization, resulting in
`call 1' BPF instructions being generated, which are calls to kernel
helpers.

However, there is a discrepancy on how the -Wunused-variable
warning (activated by -Wall) is handled in these compilers:

- clang will not emit -Wunused-variable warnings for static variables
  defined in C header files, be them constant or not constant.

- GCC will not emit -Wunused-variable warnings for _constant_ static
  variables defined in header files, but it will emit warnings for
  non-constant static variables defined in header files.

There is no reason for these bpf_helpers_def.h pointers to not be
declared constant, and it is actually desirable to do so, since their
values are not to be changed.  So this patch modifies bpf_doc.py to
generate prototypes like:

  static void *(* const bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1;

This allows GCC to not error while compiling BPF programs with `-Wall
-Werror', while still being able to detect and error on legitimate
unused variables in the program themselves.

This change doesn't impact the desired constant propagation in neither
Clang nor GCC with -O1 and higher.  On the contrary, being declared as
constant may increase the odds they get constant folded when
used/referred to in certain circumstances.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240127185031.29854-1-jose.marchesi@oracle.com
20 months agolibbpf: Add some details for BTF parsing failures
Ian Rogers [Thu, 25 Jan 2024 23:18:40 +0000 (15:18 -0800)]
libbpf: Add some details for BTF parsing failures

As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
valid kernel BTF" message makes diagnosing the kernel build issue somewhat
cryptic. Add a little more detail with the hope of helping users.

Before:
```
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not accessible:
```
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not readable:
```
libbpf: failed to read kernel BTF from (/sys/kernel/btf/vmlinux): -1
```

Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@mail.gmail.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240125231840.1647951-1-irogers@google.com
20 months agoperf/bpf: Fix duplicate type check
Florian Lehner [Sat, 20 Jan 2024 15:09:20 +0000 (16:09 +0100)]
perf/bpf: Fix duplicate type check

Remove the duplicate check on type and unify result.

Signed-off-by: Florian Lehner <dev@der-flo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20240120150920.3370-1-dev@der-flo.net
20 months agobpf: Use -Wno-error in certain tests when building with GCC
Jose E. Marchesi [Sat, 27 Jan 2024 10:07:02 +0000 (11:07 +0100)]
bpf: Use -Wno-error in certain tests when building with GCC

Certain BPF selftests contain code that, albeit being legal C, trigger
warnings in GCC that cannot be disabled.  This is the case for example
for the tests

  progs/btf_dump_test_case_bitfields.c
  progs/btf_dump_test_case_namespacing.c
  progs/btf_dump_test_case_packing.c
  progs/btf_dump_test_case_padding.c
  progs/btf_dump_test_case_syntax.c

which contain struct type declarations inside function parameter
lists.  This is problematic, because:

- The BPF selftests are built with -Werror.

- The Clang and GCC compilers sometimes differ when it comes to handle
  warnings.  in the handling of warnings.  One compiler may emit
  warnings for code that the other compiles compiles silently, and one
  compiler may offer the possibility to disable certain warnings, while
  the other doesn't.

In order to overcome this problem, this patch modifies the
tools/testing/selftests/bpf/Makefile in order to:

1. Enable the possibility of specifing per-source-file extra CFLAGS.
   This is done by defining a make variable like:

   <source-filename>-CFLAGS := <whateverflags>

   And then modifying the proper Make rule in order to use these flags
   when compiling <source-filename>.

2. Use the mechanism above to add -Wno-error to CFLAGS for the
   following selftests:

   progs/btf_dump_test_case_bitfields.c
   progs/btf_dump_test_case_namespacing.c
   progs/btf_dump_test_case_packing.c
   progs/btf_dump_test_case_padding.c
   progs/btf_dump_test_case_syntax.c

   Note the corresponding -CFLAGS variables for these files are
   defined only if the selftests are being built with GCC.

Note that, while compiler pragmas can generally be used to disable
particular warnings per file, this 1) is only possible for warning
that actually can be disabled in the command line, i.e. that have
-Wno-FOO options, and 2) doesn't apply to -Wno-error.

Tested in bpf-next master branch.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240127100702.21549-1-jose.marchesi@oracle.com
20 months agoselftests/bpf: Remove "&>" usage in the selftests
Martin KaFai Lau [Sat, 27 Jan 2024 02:50:17 +0000 (18:50 -0800)]
selftests/bpf: Remove "&>" usage in the selftests

In s390, CI reported that the sock_iter_batch selftest
hits this error very often:

2024-01-26T16:56:49.3091804Z Bind /proc/self/ns/net -> /run/netns/sock_iter_batch_netns failed: No such file or directory
2024-01-26T16:56:49.3149524Z Cannot remove namespace file "/run/netns/sock_iter_batch_netns": No such file or directory
2024-01-26T16:56:49.3772213Z test_sock_iter_batch:FAIL:ip netns add sock_iter_batch_netns unexpected error: 256 (errno 0)

It happens very often in s390 but Manu also noticed it happens very
sparsely in other arch also.

It turns out the default dash shell does not recognize "&>"
as a redirection operator, so the command went to the background.
In the sock_iter_batch selftest, the "ip netns delete" went
into background and then race with the following "ip netns add"
command.

This patch replaces the "&> /dev/null" usage with ">/dev/null 2>&1"
and does this redirection in the SYS_NOFAIL macro instead of doing
it individually by its caller. The SYS_NOFAIL callers do not care
about failure, so it is no harm to do this redirection even if
some of the existing callers do not redirect to /dev/null now.

It touches different test files, so I skipped the Fixes tags
in this patch. Some of the changed tests do not use "&>"
but they use the SYS_NOFAIL, so these tests are also
changed to avoid doing its own redirection because
SYS_NOFAIL does it internally now.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240127025017.950825-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agobpf: move arg:ctx type enforcement check inside the main logic loop
Andrii Nakryiko [Thu, 25 Jan 2024 20:55:06 +0000 (12:55 -0800)]
bpf: move arg:ctx type enforcement check inside the main logic loop

Now that bpf and bpf-next trees converged and we don't run the risk of
merge conflicts, move btf_validate_prog_ctx_type() into its most logical
place inside the main logic loop.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agolibbpf: fix __arg_ctx type enforcement for perf_event programs
Andrii Nakryiko [Thu, 25 Jan 2024 20:55:05 +0000 (12:55 -0800)]
libbpf: fix __arg_ctx type enforcement for perf_event programs

Adjust PERF_EVENT type enforcement around __arg_ctx to match exactly
what kernel is doing.

Fixes: 76ec90a996e3 ("libbpf: warn on unexpected __arg_ctx type when rewriting BTF")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agolibbpf: integrate __arg_ctx feature detector into kernel_supports()
Andrii Nakryiko [Thu, 25 Jan 2024 20:55:04 +0000 (12:55 -0800)]
libbpf: integrate __arg_ctx feature detector into kernel_supports()

Now that feature detection code is in bpf-next tree, integrate __arg_ctx
kernel-side support into kernel_supports() framework.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agodocs/bpf: Improve documentation of 64-bit immediate instructions
Yonghong Song [Sat, 27 Jan 2024 19:46:29 +0000 (11:46 -0800)]
docs/bpf: Improve documentation of 64-bit immediate instructions

For 64-bit immediate instruction, 'BPF_IMM | BPF_DW | BPF_LD' and
src_reg=[0-6], the current documentation describes the 64-bit
immediate is constructed by:

  imm64 = (next_imm << 32) | imm

But actually imm64 is only used when src_reg=0. For all other
variants (src_reg != 0), 'imm' and 'next_imm' have separate special
encoding requirement and imm64 cannot be easily used to describe
instruction semantics.

This patch clarifies that 64-bit immediate instructions use
two 32-bit immediate values instead of a 64-bit immediate value,
so later describing individual 64-bit immediate instructions
becomes less confusing.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Thaler <dthaler1968@gmail.com>
Link: https://lore.kernel.org/bpf/20240127194629.737589-1-yonghong.song@linux.dev
20 months agobpf: Remove unused field "mod" in struct bpf_trampoline
Menglong Dong [Sun, 28 Jan 2024 05:54:43 +0000 (13:54 +0800)]
bpf: Remove unused field "mod" in struct bpf_trampoline

It seems that the field "mod" in struct bpf_trampoline is not used
anywhere after the commit 31bf1dbccfb0 ("bpf: Fix attaching
fentry/fexit/fmod_ret/lsm to modules"). So we can just remove it now.

Fixes: 31bf1dbccfb0 ("bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules")
Signed-off-by: Menglong Dong <dongmenglong.8@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240128055443.413291-1-dongmenglong.8@bytedance.com
20 months agoselftests/bpf: Drop return in bpf_testmod_exit
Geliang Tang [Sun, 28 Jan 2024 11:43:57 +0000 (19:43 +0800)]
selftests/bpf: Drop return in bpf_testmod_exit

bpf_testmod_exit() does not need to have a return value (given the void),
so this patch drops this useless 'return' in it.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/5765b287ea088f0c820f2a834faf9b20fb2f8215.1706442113.git.tanggeliang@kylinos.cn
20 months agoriscv, bpf: Optimize bswap insns with Zbb support
Pu Lehui [Mon, 15 Jan 2024 13:12:35 +0000 (13:12 +0000)]
riscv, bpf: Optimize bswap insns with Zbb support

Optimize bswap instructions by rev8 Zbb instruction conbined with srli
instruction. And Optimize 16-bit zero-extension with Zbb support.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-7-pulehui@huaweicloud.com
20 months agoriscv, bpf: Optimize sign-extention mov insns with Zbb support
Pu Lehui [Mon, 15 Jan 2024 13:12:34 +0000 (13:12 +0000)]
riscv, bpf: Optimize sign-extention mov insns with Zbb support

Add 8-bit and 16-bit sign-extention wraper with Zbb support to optimize
sign-extension mov instructions.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-6-pulehui@huaweicloud.com
20 months agoriscv, bpf: Add necessary Zbb instructions
Pu Lehui [Mon, 15 Jan 2024 13:12:33 +0000 (13:12 +0000)]
riscv, bpf: Add necessary Zbb instructions

Add necessary Zbb instructions introduced by [0] to reduce code size and
improve performance of RV64 JIT. Meanwhile, a runtime deteted helper is
added to check whether the CPU supports Zbb instructions.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf
Link: https://lore.kernel.org/bpf/20240115131235.2914289-5-pulehui@huaweicloud.com
20 months agoriscv, bpf: Simplify sext and zext logics in branch instructions
Pu Lehui [Mon, 15 Jan 2024 13:12:32 +0000 (13:12 +0000)]
riscv, bpf: Simplify sext and zext logics in branch instructions

There are many extension helpers in the current branch instructions, and
the implementation is a bit complicated. We simplify this logic through
two simple extension helpers with alternate register.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-4-pulehui@huaweicloud.com
20 months agoriscv, bpf: Unify 32-bit zero-extension to emit_zextw
Pu Lehui [Mon, 15 Jan 2024 13:12:31 +0000 (13:12 +0000)]
riscv, bpf: Unify 32-bit zero-extension to emit_zextw

For code unification, add emit_zextw wrapper to unify all the 32-bit
zero-extension operations.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-3-pulehui@huaweicloud.com
20 months agoriscv, bpf: Unify 32-bit sign-extension to emit_sextw
Pu Lehui [Mon, 15 Jan 2024 13:12:30 +0000 (13:12 +0000)]
riscv, bpf: Unify 32-bit sign-extension to emit_sextw

For code unification, add emit_sextw wrapper to unify all the 32-bit
sign-extension operations.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-2-pulehui@huaweicloud.com
20 months agolibbpf: Fix faccessat() usage on Android
Andrii Nakryiko [Fri, 26 Jan 2024 22:09:44 +0000 (14:09 -0800)]
libbpf: Fix faccessat() usage on Android

Android implementation of libc errors out with -EINVAL in faccessat() if
passed AT_EACCESS ([0]), this leads to ridiculous issue with libbpf
refusing to load /sys/kernel/btf/vmlinux on Androids ([1]). Fix by
detecting Android and redefining AT_EACCESS to 0, it's equivalent on
Android.

  [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50
  [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250

Fixes: 6a4ab8869d0b ("libbpf: Fix the case of running as non-root with capabilities")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240126220944.2497665-1-andrii@kernel.org
20 months agobpftool: Be more portable by using POSIX's basename()
Arnaldo Carvalho de Melo [Mon, 29 Jan 2024 14:33:26 +0000 (11:33 -0300)]
bpftool: Be more portable by using POSIX's basename()

musl libc had the basename() prototype in string.h, but this is a
glibc-ism, now they removed the _GNU_SOURCE bits in their devel distro,
Alpine Linux edge:

  https://git.musl-libc.org/cgit/musl/commit/?id=725e17ed6dff4d0cd22487bb64470881e86a92e7

So lets use the POSIX version, the whole rationale is spelled out at:

  https://gitlab.alpinelinux.org/alpine/aports/-/issues/15643

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <olsajiri@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/lkml/ZZhsPs00TI75RdAr@kernel.org
Link: https://lore.kernel.org/bpf/Zbe3NuOgaupvUcpF@kernel.org
20 months agonet: free altname using an RCU callback
Jakub Kicinski [Fri, 26 Jan 2024 20:14:49 +0000 (12:14 -0800)]
net: free altname using an RCU callback

We had to add another synchronize_rcu() in recent fix.
Bite the bullet and add an rcu_head to netdev_name_node,
free from RCU.

Note that name_node does not hold any reference on dev
to which it points, but there must be a synchronize_rcu()
on device removal path, so we should be fine.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agodt-bindings: nfc: ti,trf7970a: fix usage example
Tobias Schramm [Thu, 25 Jan 2024 20:15:05 +0000 (21:15 +0100)]
dt-bindings: nfc: ti,trf7970a: fix usage example

The TRF7970A is a SPI device, not I2C.

Signed-off-by: Tobias Schramm <t.schramm@manjaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoptp: add FemtoClock3 Wireless as ptp hardware clock
Min Li [Wed, 24 Jan 2024 18:49:47 +0000 (13:49 -0500)]
ptp: add FemtoClock3 Wireless as ptp hardware clock

The RENESAS FemtoClock3 Wireless is a high-performance jitter attenuator,
frequency translator, and clock synthesizer. The device is comprised of 3
digital PLLs (DPLL) to track CLKIN inputs and three independent low phase
noise fractional output dividers (FOD) that output low phase noise clocks.

FemtoClock3 supports one Time Synchronization (Time Sync) channel to enable
an external processor to control the phase and frequency of the Time Sync
channel and to take phase measurements using the TDC. Intended applications
are synchronization using the precision time protocol (PTP) and
synchronization with 0.5 Hz and 1 Hz signals from GNSS.

Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoptp: introduce PTP_CLOCK_EXTOFF event for the measured external offset
Min Li [Wed, 24 Jan 2024 18:49:46 +0000 (13:49 -0500)]
ptp: introduce PTP_CLOCK_EXTOFF event for the measured external offset

This change is for the PHC devices that can measure the phase offset
between PHC signal and the external signal, such as the 1PPS signal of
GNSS. Reporting PTP_CLOCK_EXTOFF to user space will be piggy-backed to
the existing ptp_extts_event so that application such as ts2phc can
poll the external offset the same way as extts. Hence, ts2phc can use
the offset to achieve the alignment between PHC and the external signal
by the help of either SW or HW filters.

Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'net-module-description'
David S. Miller [Mon, 29 Jan 2024 12:12:51 +0000 (12:12 +0000)]
Merge branch 'net-module-description'

Breno Leitao says:

====================
Fix MODULE_DESCRIPTION() for net (p3)

There are hundreds of network modules that misses MODULE_DESCRIPTION(),
causing a warning when compiling with W=1. Example:

        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com90io.o
        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/arc-rimi.o
        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020.o

This part3 of the patchset focus on the missing ethernet drivers, which
is now warning free. This also fixes net/pcs and ieee802154.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for arcnet
Breno Leitao [Thu, 25 Jan 2024 19:34:20 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for arcnet

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to arcnet module.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for ieee802154
Breno Leitao [Thu, 25 Jan 2024 19:34:19 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for ieee802154

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to ieee802154 modules.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for PCS drivers
Breno Leitao [Thu, 25 Jan 2024 19:34:18 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for PCS drivers

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Lynx, XPCS and LynxI PCS drivers.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for ec_bhf
Breno Leitao [Thu, 25 Jan 2024 19:34:17 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for ec_bhf

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Beckhoff CX5020 EtherCAT Ethernet driver.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for cpsw-common
Breno Leitao [Thu, 25 Jan 2024 19:34:16 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for cpsw-common

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the TI CPSW switch module.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for dwmac-socfpga
Breno Leitao [Thu, 25 Jan 2024 19:34:15 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for dwmac-socfpga

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the STMicro DWMAC for Altera SOCs.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for Qualcom drivers
Breno Leitao [Thu, 25 Jan 2024 19:34:14 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for Qualcom drivers

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Qualcom rmnet and emac drivers.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for SMSC drivers
Breno Leitao [Thu, 25 Jan 2024 19:34:13 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for SMSC drivers

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the SMSC 91x/911x/9420 Ethernet drivers.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for ocelot
Breno Leitao [Thu, 25 Jan 2024 19:34:12 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for ocelot

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Ocelot SoCs (VSC7514) helpers driver.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: fill in MODULE_DESCRIPTION()s for encx24j600
Breno Leitao [Thu, 25 Jan 2024 19:34:11 +0000 (11:34 -0800)]
net: fill in MODULE_DESCRIPTION()s for encx24j600

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Microchip ENCX24J600 helpers driver.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agotaprio: validate TCA_TAPRIO_ATTR_FLAGS through policy instead of open-coding
Alessandro Marcolini [Thu, 25 Jan 2024 16:59:42 +0000 (17:59 +0100)]
taprio: validate TCA_TAPRIO_ATTR_FLAGS through policy instead of open-coding

As of now, the field TCA_TAPRIO_ATTR_FLAGS is being validated by manually
checking its value, using the function taprio_flags_valid().

With this patch, the field will be validated through the netlink policy
NLA_POLICY_MASK, where the mask is defined by TAPRIO_SUPPORTED_FLAGS.
The mutual exclusivity of the two flags TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD
and TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST is still checked manually.

Changes since RFC:
- fixed reversed xmas tree
- use NL_SET_ERR_MSG_MOD() for both invalid configuration

Changes since v1:
- Changed NL_SET_ERR_MSG_MOD to NL_SET_ERR_MSG_ATTR when wrong flags
  issued
- Changed __u32 to u32

Changes since v2:
- Added the missing parameter for NL_SET_ERR_MSG_ATTR (sorry again for
  the noise)

Signed-off-by: Alessandro Marcolini <alessandromarcolini99@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoocteontx2-af: Add filter profiles in hardware to extract packet headers
Suman Ghosh [Wed, 24 Jan 2024 09:53:38 +0000 (15:23 +0530)]
octeontx2-af: Add filter profiles in hardware to extract packet headers

This patch adds hardware profile supports for extracting packet headers.
It makes sure that hardware is capabale of extracting ICMP, CPT, ERSPAN
headers.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'txgbe-irq_domain'
David S. Miller [Sat, 27 Jan 2024 14:31:55 +0000 (14:31 +0000)]
Merge branch 'txgbe-irq_domain'

Jiawen Wu says:

====================
Implement irq_domain for TXGBE

Implement irq_domain for the MAC interrupt and handle the sub-irqs.

v3 -> v4:
- fix build error

v2 -> v3:
- use macro defines instead of magic number

v1 -> v2:
- move interrupt codes to txgbe_irq.c
- add txgbe-link-irq to msic irq domain
- remove functions that are not needed
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: txgbe: use irq_domain for interrupt controller
Jiawen Wu [Thu, 25 Jan 2024 06:22:13 +0000 (14:22 +0800)]
net: txgbe: use irq_domain for interrupt controller

In the current interrupt controller, the MAC interrupt acts as the
parent interrupt in the GPIO IRQ chip. But when the number of Rx/Tx
ring changes, the PCI IRQ vector needs to be reallocated. Then this
interrupt controller would be corrupted. So use irq_domain structure
to avoid the above problem.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agonet: txgbe: move interrupt codes to a separate file
Jiawen Wu [Thu, 25 Jan 2024 06:22:12 +0000 (14:22 +0800)]
net: txgbe: move interrupt codes to a separate file

In order to change the interrupt response structure, there will be a
lot of code added next. Move these interrupt codes to a new file, to
make the codes cleaner.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoDocumentation: mlx5.rst: Add note for eswitch MD
William Tu [Thu, 25 Jan 2024 04:00:41 +0000 (20:00 -0800)]
Documentation: mlx5.rst: Add note for eswitch MD

Add a note when using esw_port_metadata. The parameter has runtime
mode but setting it does not take effect immediately. Setting it must
happen in legacy mode, and the port metadata takes effects when the
switchdev mode is enabled.

Disable eswitch port metadata::
  $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value \
    false cmode runtime
Change eswitch mode to switchdev mode where after choosing the metadata value::
  $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Note that other mlx5 devlink runtime parameters, esw_multiport and
flow_steering_mode, do not have this limitation.

Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agorust: phy: use VTABLE_DEFAULT_ERROR
FUJITA Tomonori [Thu, 25 Jan 2024 01:45:02 +0000 (10:45 +0900)]
rust: phy: use VTABLE_DEFAULT_ERROR

Since 6.8-rc1, using VTABLE_DEFAULT_ERROR for optional functions
(never called) in #[vtable] is the recommended way.

Note that no functional changes in this patch.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agorust: phy: use `srctree`-relative links
FUJITA Tomonori [Thu, 25 Jan 2024 01:45:01 +0000 (10:45 +0900)]
rust: phy: use `srctree`-relative links

The relative paths like the following are bothersome and don't work
with `O=` builds:

//! C headers: [`include/linux/phy.h`](../../../../../../../include/linux/phy.h).

This updates such links by using the `srctree`-relative link feature
introduced in 6.8-rc1 like:

//! C headers: [`include/linux/phy.h`](srctree/include/linux/phy.h).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
20 months agoMerge branch 'net-dsa-microchip-implement-phy-loopback'
Jakub Kicinski [Sat, 27 Jan 2024 05:25:29 +0000 (21:25 -0800)]
Merge branch 'net-dsa-microchip-implement-phy-loopback'

Oleksij Rempel says:

====================
net: dsa: microchip: implement PHY loopback
====================

Link: https://lore.kernel.org/r/20240124123314.734815-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: dsa: microchip: implement PHY loopback configuration for KSZ8794 and KSZ8873
Oleksij Rempel [Wed, 24 Jan 2024 12:33:14 +0000 (13:33 +0100)]
net: dsa: microchip: implement PHY loopback configuration for KSZ8794 and KSZ8873

Correct the PHY loopback bit handling in the ksz8_w_phy_bmcr and
ksz8_r_phy_bmcr functions for KSZ8794 and KSZ8873 variants in the ksz8795
driver. Previously, the code erroneously used Bit 7 of port register 0xD
for both chip variants, which is actually for LED configuration. This
update ensures the correct registers and bits are used for the PHY
loopback feature:

- For KSZ8794: Use 0xF / Bit 7.
- For KSZ8873: Use 0xD / Bit 0.

The lack of loopback support was seen on KSZ8873 system by using
"ethtool -t lanX". After this patch, the ethtool selftest will work,
but only if port is not part of a bridge.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20240124123314.734815-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: dsa: microchip: Remove redundant optimization in ksz8_w_phy_bmcr
Oleksij Rempel [Wed, 24 Jan 2024 12:33:13 +0000 (13:33 +0100)]
net: dsa: microchip: Remove redundant optimization in ksz8_w_phy_bmcr

Remove the manual checks for register value changes in the
ksz8_w_phy_bmcr function. Instead, rely on regmap_update_bits() for
optimizing register updates.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20240124123314.734815-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: dsa: microchip: ksz8: move BMCR specific code to separate function
Oleksij Rempel [Wed, 24 Jan 2024 12:33:12 +0000 (13:33 +0100)]
net: dsa: microchip: ksz8: move BMCR specific code to separate function

Isolate the Basic Mode Control Register (BMCR) operations in the ksz8795
driver by moving the BMCR-related code segments from the ksz8_r_phy()
and ksz8_w_phy() functions to newly created ksz8_r_phy_bmcr() and
ksz8_w_phy_bmcr() functions.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20240124123314.734815-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Sat, 27 Jan 2024 05:08:21 +0000 (21:08 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-01-26

We've added 107 non-merge commits during the last 4 day(s) which contain
a total of 101 files changed, 6009 insertions(+), 1260 deletions(-).

The main changes are:

1) Add BPF token support to delegate a subset of BPF subsystem
   functionality from privileged system-wide daemons such as systemd
   through special mount options for userns-bound BPF fs to a trusted
   & unprivileged application. With addressed changes from Christian
   and Linus' reviews, from Andrii Nakryiko.

2) Support registration of struct_ops types from modules which helps
   projects like fuse-bpf that seeks to implement a new struct_ops type,
   from Kui-Feng Lee.

3) Add support for retrieval of cookies for perf/kprobe multi links,
   from Jiri Olsa.

4) Bigger batch of prep-work for the BPF verifier to eventually support
   preserving boundaries and tracking scalars on narrowing fills,
   from Maxim Mikityanskiy.

5) Extend the tc BPF flavor to support arbitrary TCP SYN cookies to help
   with the scenario of SYN floods, from Kuniyuki Iwashima.

6) Add code generation to inline the bpf_kptr_xchg() helper which
   improves performance when stashing/popping the allocated BPF objects,
   from Hou Tao.

7) Extend BPF verifier to track aligned ST stores as imprecise spilled
   registers, from Yonghong Song.

8) Several fixes to BPF selftests around inline asm constraints and
   unsupported VLA code generation, from Jose E. Marchesi.

9) Various updates to the BPF IETF instruction set draft document such
   as the introduction of conformance groups for instructions,
   from Dave Thaler.

10) Fix BPF verifier to make infinite loop detection in is_state_visited()
    exact to catch some too lax spill/fill corner cases,
    from Eduard Zingerman.

11) Refactor the BPF verifier pointer ALU check to allow ALU explicitly
    instead of implicitly for various register types, from Hao Sun.

12) Fix the flaky tc_redirect_dtime BPF selftest due to slowness
    in neighbor advertisement at setup time, from Martin KaFai Lau.

13) Change BPF selftests to skip callback tests for the case when the
    JIT is disabled, from Tiezhu Yang.

14) Add a small extension to libbpf which allows to auto create
    a map-in-map's inner map, from Andrey Grafin.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (107 commits)
  selftests/bpf: Add missing line break in test_verifier
  bpf, docs: Clarify definitions of various instructions
  bpf: Fix error checks against bpf_get_btf_vmlinux().
  bpf: One more maintainer for libbpf and BPF selftests
  selftests/bpf: Incorporate LSM policy to token-based tests
  selftests/bpf: Add tests for LIBBPF_BPF_TOKEN_PATH envvar
  libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
  selftests/bpf: Add tests for BPF object load with implicit token
  selftests/bpf: Add BPF object loading tests with explicit token passing
  libbpf: Wire up BPF token support at BPF object level
  libbpf: Wire up token_fd into feature probing logic
  libbpf: Move feature detection code into its own file
  libbpf: Further decouple feature checking logic from bpf_object
  libbpf: Split feature detectors definitions from cached results
  selftests/bpf: Utilize string values for delegate_xxx mount options
  bpf: Support symbolic BPF FS delegation mount options
  bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS
  bpf,selinux: Allocate bpf_security_struct per BPF token
  selftests/bpf: Add BPF token-enabled tests
  libbpf: Add BPF token support to bpf_prog_load() API
  ...
====================

Link: https://lore.kernel.org/r/20240126215710.19855-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'net-phy-generic-polarity-led-support-for-qca808x'
Jakub Kicinski [Sat, 27 Jan 2024 05:03:43 +0000 (21:03 -0800)]
Merge branch 'net-phy-generic-polarity-led-support-for-qca808x'

Christian Marangi says:

====================
net: phy: generic polarity + LED support for qca808x

This small series add LEDs support for qca808x.

QCA808x apply on PHY reset a strange polarity settings and require
some tweak to apply a more common configuration found on devices.
On adding support for it, it was pointed out that a similar
feature is also being implemented for a marvell PHY where
LED polarity is set per LED (and not global) and also have
a special mode where the LED is tristated.

The first 3 patch are to generalize this as we expect more PHY
in the future to have a similar configuration.

The implementation is extensible to support additional special
mode in the future with minimal changes and don't create regression
on already implemented PHY drivers.
====================

Link: https://lore.kernel.org/r/20240125203702.4552-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: at803x: add LED support for qca808x
Christian Marangi [Thu, 25 Jan 2024 20:37:01 +0000 (21:37 +0100)]
net: phy: at803x: add LED support for qca808x

Add LED support for QCA8081 PHY.

Documentation for this LEDs PHY is very scarce even with NDA access
to Documentation for OEMs. Only the blink pattern are documented and are
very confusing most of the time. No documentation is present about
forcing the LED on/off or to always blink.

Those settings were reversed by poking the regs and trying to find the
correct bits to trigger these modes. Some bits mode are not clear and
maybe the documentation option are not 100% correct. For the sake of LED
support the reversed option are enough to add support for current LED
APIs.

Supported HW control modes are:
- tx
- rx
- link_10
- link_100
- link_1000
- link_2500
- half_duplex
- full_duplex

Also add support for LED polarity set to set LED polarity to active
high or low. QSDK sets this value to high by default but PHY reset value
doesn't have this enabled by default.

QSDK also sets 2 additional bits but their usage is not clear, info about
this is added in the header. It was verified that for correct function
of the LED if active high is needed, only BIT 6 is needed.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240125203702.4552-6-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agodt-bindings: net: Document QCA808x PHYs
Christian Marangi [Thu, 25 Jan 2024 20:37:00 +0000 (21:37 +0100)]
dt-bindings: net: Document QCA808x PHYs

Add Documentation for QCA808x PHYs for the additional LED configuration
for this PHY.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240125203702.4552-5-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: phy: add support for PHY LEDs polarity modes
Christian Marangi [Thu, 25 Jan 2024 20:36:59 +0000 (21:36 +0100)]
net: phy: add support for PHY LEDs polarity modes

Add support for PHY LEDs polarity modes. Some PHY require LED to be set
to active low to be turned ON. Adds support for this by declaring
active-low property in DT.

PHY driver needs to declare .led_polarity_set() to configure LED
polarity modes. Function will pass the index with the LED index and a
bitmap with all the required modes to set.

Current supported modes are:
- active-low with the flag PHY_LED_ACTIVE_LOW. LED is set to active-low
  to turn it ON.
- inactive-high-impedance with the flag PHY_LED_INACTIVE_HIGH_IMPEDANCE.
  LED is set to high impedance to turn it OFF.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240125203702.4552-4-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agodt-bindings: net: phy: Document LED inactive high impedance mode
Christian Marangi [Thu, 25 Jan 2024 20:36:58 +0000 (21:36 +0100)]
dt-bindings: net: phy: Document LED inactive high impedance mode

Document LED inactive high impedance mode to set the LED to require high
impedance configuration to be turned OFF.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240125203702.4552-3-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agodt-bindings: net: phy: Make LED active-low property common
Christian Marangi [Thu, 25 Jan 2024 20:36:57 +0000 (21:36 +0100)]
dt-bindings: net: phy: Make LED active-low property common

Move LED active-low property to common.yaml. This property is currently
defined multiple times by bcm LEDs. This property will now be supported
in a generic way for PHY LEDs with the use of a generic function.

With active-low bool property not defined, active-high is always
assumed.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240125203702.4552-2-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agobnx2x: Fix firmware version string character counts
Kees Cook [Fri, 26 Jan 2024 04:10:48 +0000 (20:10 -0800)]
bnx2x: Fix firmware version string character counts

A potential string truncation was reported in bnx2x_fill_fw_str(),
when a long bp->fw_ver and a long phy_fw_ver might coexist, but seems
unlikely with real-world hardware.

Use scnprintf() to indicate the intent that truncations are tolerated.

While reading this code, I found a collection of various buffer size
counting issues. None looked like they might lead to a buffer overflow
with current code (the small buffers are 20 bytes and might only ever
consume 10 bytes twice with a trailing %NUL). However, early truncation
(due to a %NUL in the middle of the string) might be happening under
likely rare conditions. Regardless fix the formatters and related
functions:

- Switch from a separate strscpy() to just adding an additional "%s" to
  the format string that immediately follows it in bnx2x_fill_fw_str().
- Use sizeof() universally instead of using unbound defines.
- Fix bnx2x_7101_format_ver() and bnx2x_null_format_ver() to report the
  number of characters written, not including the trailing %NUL (as
  already done with the other firmware formatting functions).
- Require space for at least 1 byte in bnx2x_get_ext_phy_fw_version()
  for the trailing %NUL.
- Correct the needed buffer size in bnx2x_3_seq_format_ver().

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401260858.jZN6vD1k-lkp@intel.com/
Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240126041044.work.220-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agodrivers/ptp: Convert snprintf to sysfs_emit
Li Zhijian [Thu, 25 Jan 2024 01:53:29 +0000 (09:53 +0800)]
drivers/ptp: Convert snprintf to sysfs_emit

Per filesystems/sysfs.rst, show() should only use sysfs_emit()
or sysfs_emit_at() when formatting the value to be returned to user space.

coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().

> ./drivers/ptp/ptp_sysfs.c:27:8-16: WARNING: please use sysfs_emit

No functional change intended

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20240125015329.123023-1-lizhijian@fujitsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'af_unix-random-improvements-for-gc'
Jakub Kicinski [Sat, 27 Jan 2024 04:34:28 +0000 (20:34 -0800)]
Merge branch 'af_unix-random-improvements-for-gc'

Kuniyuki Iwashima says:

====================
af_unix: Random improvements for GC.

If more than 16000 inflight AF_UNIX sockets exist on a host, each
sendmsg() will be forced to wait for unix_gc() even if a process
is not sending any FD.

This series tries not to impose such a penalty on sane users who
do not send AF_UNIX FDs or do not have inflight sockets more than
SCM_MAX_FD * 8.

The first patch can be backported to -stable.

Cleanup patches for commit 69db702c8387 ("io_uring/af_unix: disable
sending io_uring over sockets") and large refactoring of GC will
be followed later.

v4: https://lore.kernel.org/netdev/20231219030102.27509-1-kuniyu@amazon.com/
v3: https://lore.kernel.org/netdev/20231218075020.60826-1-kuniyu@amazon.com/
v2: https://lore.kernel.org/netdev/20231123014747.66063-1-kuniyu@amazon.com/
v1: https://lore.kernel.org/netdev/20231122013629.28554-1-kuniyu@amazon.com/
====================

Link: https://lore.kernel.org/r/20240123170856.41348-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoaf_unix: Try to run GC async.
Kuniyuki Iwashima [Tue, 23 Jan 2024 17:08:56 +0000 (09:08 -0800)]
af_unix: Try to run GC async.

If more than 16000 inflight AF_UNIX sockets exist and the garbage
collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc().
Also, they wait for unix_gc() to complete.

In unix_gc(), all inflight AF_UNIX sockets are traversed at least once,
and more if they are the GC candidate.  Thus, sendmsg() significantly
slows down with too many inflight AF_UNIX sockets.

However, if a process sends data with no AF_UNIX FD, the sendmsg() call
does not need to wait for GC.  After this change, only the process that
meets the condition below will be blocked under such a situation.

  1) cmsg contains AF_UNIX socket
  2) more than 32 AF_UNIX sent by the same user are still inflight

Note that even a sendmsg() call that does not meet the condition but has
AF_UNIX FD will be blocked later in unix_scm_to_skb() by the spinlock,
but we allow that as a bonus for sane users.

The results below are the time spent in unix_dgram_sendmsg() sending 1
byte of data with no FD 4096 times on a host where 32K inflight AF_UNIX
sockets exist.

Without series: the sane sendmsg() needs to wait gc unreasonably.

  $ sudo /usr/share/bcc/tools/funclatency -p 11165 unix_dgram_sendmsg
  Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
  ^C
       nsecs               : count     distribution
  [...]
      524288 -> 1048575    : 0        |                                        |
     1048576 -> 2097151    : 3881     |****************************************|
     2097152 -> 4194303    : 214      |**                                      |
     4194304 -> 8388607    : 1        |                                        |

  avg = 1825567 nsecs, total: 7477526027 nsecs, count: 4096

With series: the sane sendmsg() can finish much faster.

  $ sudo /usr/share/bcc/tools/funclatency -p 8702  unix_dgram_sendmsg
  Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end.
  ^C
       nsecs               : count     distribution
  [...]
         128 -> 255        : 0        |                                        |
         256 -> 511        : 4092     |****************************************|
         512 -> 1023       : 2        |                                        |
        1024 -> 2047       : 0        |                                        |
        2048 -> 4095       : 0        |                                        |
        4096 -> 8191       : 1        |                                        |
        8192 -> 16383      : 1        |                                        |

  avg = 410 nsecs, total: 1680510 nsecs, count: 4096

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoaf_unix: Run GC on only one CPU.
Kuniyuki Iwashima [Tue, 23 Jan 2024 17:08:55 +0000 (09:08 -0800)]
af_unix: Run GC on only one CPU.

If more than 16000 inflight AF_UNIX sockets exist and the garbage
collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc().
Also, they wait for unix_gc() to complete.

In unix_gc(), all inflight AF_UNIX sockets are traversed at least once,
and more if they are the GC candidate.  Thus, sendmsg() significantly
slows down with too many inflight AF_UNIX sockets.

There is a small window to invoke multiple unix_gc() instances, which
will then be blocked by the same spinlock except for one.

Let's convert unix_gc() to use struct work so that it will not consume
CPUs unnecessarily.

Note WRITE_ONCE(gc_in_progress, true) is moved before running GC.
If we leave the WRITE_ONCE() as is and use the following test to
call flush_work(), a process might not call it.

    CPU 0                                     CPU 1
    ---                                       ---
                                              start work and call __unix_gc()
    if (work_pending(&unix_gc_work) ||        <-- false
        READ_ONCE(gc_in_progress))            <-- false
            flush_work();                     <-- missed!
                                      WRITE_ONCE(gc_in_progress, true)

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240123170856.41348-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoaf_unix: Return struct unix_sock from unix_get_socket().
Kuniyuki Iwashima [Tue, 23 Jan 2024 17:08:54 +0000 (09:08 -0800)]
af_unix: Return struct unix_sock from unix_get_socket().

Currently, unix_get_socket() returns struct sock, but after calling
it, we always cast it to unix_sk().

Let's return struct unix_sock from unix_get_socket().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoaf_unix: Do not use atomic ops for unix_sk(sk)->inflight.
Kuniyuki Iwashima [Tue, 23 Jan 2024 17:08:53 +0000 (09:08 -0800)]
af_unix: Do not use atomic ops for unix_sk(sk)->inflight.

When touching unix_sk(sk)->inflight, we are always under
spin_lock(&unix_gc_lock).

Let's convert unix_sk(sk)->inflight to the normal unsigned long.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoaf_unix: Annotate data-race of gc_in_progress in wait_for_unix_gc().
Kuniyuki Iwashima [Tue, 23 Jan 2024 17:08:52 +0000 (09:08 -0800)]
af_unix: Annotate data-race of gc_in_progress in wait_for_unix_gc().

gc_in_progress is changed under spin_lock(&unix_gc_lock),
but wait_for_unix_gc() reads it locklessly.

Let's use READ_ONCE().

Fixes: 5f23b734963e ("net: Fix soft lockups/OOM issues w/ unix garbage collector")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240123170856.41348-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agonet: dsa: mt7530: select MEDIATEK_GE_PHY for NET_DSA_MT7530_MDIO
Arınç ÜNAL [Mon, 22 Jan 2024 05:34:51 +0000 (08:34 +0300)]
net: dsa: mt7530: select MEDIATEK_GE_PHY for NET_DSA_MT7530_MDIO

Quoting from commit 4223f8651287 ("net: dsa: mt7530: make NET_DSA_MT7530
select MEDIATEK_GE_PHY"):

Make MediaTek MT753x DSA driver enable MediaTek Gigabit PHYs driver to
properly control MT7530 and MT7531 switch PHYs.

A noticeable change is that the behaviour of switchport interfaces going
up-down-up-down is no longer there.

Now, the switch can be used without the PHYs but, at the moment, every
hardware design out there that I have seen uses them. For that, it would
make the most sense to force the selection of MEDIATEK_GE_PHY for the MDIO
interface which currently controls the MT7530 and MT7531 switches.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240122053451.8004-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests/bpf: Add missing line break in test_verifier
Tiezhu Yang [Fri, 26 Jan 2024 01:57:36 +0000 (09:57 +0800)]
selftests/bpf: Add missing line break in test_verifier

There are no break lines in the test log for test_verifier #106 ~ #111
if jit is disabled, add the missing line break at the end of printf()
to fix it.

Without this patch:

  [root@linux bpf]# echo 0 > /proc/sys/net/core/bpf_jit_enable
  [root@linux bpf]# ./test_verifier 106
  #106/p inline simple bpf_loop call SKIP (requires BPF JIT)Summary: 0 PASSED, 1 SKIPPED, 0 FAILED

With this patch:

  [root@linux bpf]# echo 0 > /proc/sys/net/core/bpf_jit_enable
  [root@linux bpf]# ./test_verifier 106
  #106/p inline simple bpf_loop call SKIP (requires BPF JIT)
  Summary: 0 PASSED, 1 SKIPPED, 0 FAILED

Fixes: 0b50478fd877 ("selftests/bpf: Skip callback tests if jit is disabled in test_verifier")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240126015736.655-1-yangtiezhu@loongson.cn
20 months agobpf, docs: Clarify definitions of various instructions
Dave Thaler [Fri, 26 Jan 2024 04:00:50 +0000 (20:00 -0800)]
bpf, docs: Clarify definitions of various instructions

Clarify definitions of several instructions:

* BPF_NEG does not support BPF_X
* BPF_CALL does not support BPF_JMP32 or BPF_X
* BPF_EXIT does not support BPF_X
* BPF_JA does not support BPF_X (was implied but not explicitly stated)

Also fix a typo in the wide instruction figure where the field is
actually named "opcode" not "code".

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240126040050.8464-1-dthaler1968@gmail.com
20 months agobpf: Fix error checks against bpf_get_btf_vmlinux().
Kui-Feng Lee [Fri, 26 Jan 2024 02:31:13 +0000 (18:31 -0800)]
bpf: Fix error checks against bpf_get_btf_vmlinux().

In bpf_struct_ops_map_alloc, it needs to check for NULL in the returned
pointer of bpf_get_btf_vmlinux() when CONFIG_DEBUG_INFO_BTF is not set.
ENOTSUPP is used to preserve the same behavior before the
struct_ops kmod support.

In the function check_struct_ops_btf_id(), instead of redoing the
bpf_get_btf_vmlinux() that has already been done in syscall.c, the fix
here is to check for prog->aux->attach_btf_id.
BPF_PROG_TYPE_STRUCT_OPS must require attach_btf_id and syscall.c
guarantees a valid attach_btf as long as attach_btf_id is set.
When attach_btf_id is not set, this patch returns -ENOTSUPP
because it is what the selftest in test_libbpf_probe_prog_types()
and libbpf_probes.c are expecting for feature probing purpose.

Changes from v1:

 - Remove an unnecessary NULL check in check_struct_ops_btf_id()

Reported-by: syzbot+88f0aafe5f950d7489d7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/00000000000040d68a060fc8db8c@google.com/
Reported-by: syzbot+1336f3d4b10bcda75b89@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/00000000000026353b060fc21c07@google.com/
Fixes: fcc2c1fb0651 ("bpf: pass attached BTF to the bpf_struct_ops subsystem")
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240126023113.1379504-1-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
20 months agobpf: One more maintainer for libbpf and BPF selftests
Eduard Zingerman [Fri, 26 Jan 2024 03:25:54 +0000 (05:25 +0200)]
bpf: One more maintainer for libbpf and BPF selftests

I've been working on BPF verifier, BPF selftests and, to some extent,
libbpf, for some time. As suggested by Andrii and Alexei,
I humbly ask to add me to maintainers list:
- As reviewer   for BPF [GENERAL]
- As maintainer for BPF [LIBRARY]
- As maintainer for BPF [SELFTESTS]

This patch adds dedicated entries to MAINTAINERS.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240126032554.9697-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
20 months agotsnep: Add link down PHY loopback support
Gerhard Engleder [Tue, 23 Jan 2024 20:01:51 +0000 (21:01 +0100)]
tsnep: Add link down PHY loopback support

PHY loopback turns off link state change signalling. Therefore, the
loopback only works if the link is already up before the PHY loopback is
activated.

Ensure that PHY loopback works even if the link is not already up during
activation by calling netif_carrier_on() explicitly.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://lore.kernel.org/r/20240123200151.60848-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agogve: Modify rx_buf_alloc_fail counter centrally and closer to failure
Ankit Garg [Wed, 24 Jan 2024 20:54:35 +0000 (20:54 +0000)]
gve: Modify rx_buf_alloc_fail counter centrally and closer to failure

Previously, each caller of gve_rx_alloc_buffer had to increase counter
 and as a result one caller was not tracking those failure. Increasing
 counters at a common location now so callers don't have to duplicate
 code or miss counter management.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Link: https://lore.kernel.org/r/20240124205435.1021490-1-nktgrg@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'selftests-updates-to-fcnal-test-for-autoamted-environment'
Jakub Kicinski [Fri, 26 Jan 2024 01:14:24 +0000 (17:14 -0800)]
Merge branch 'selftests-updates-to-fcnal-test-for-autoamted-environment'

David Ahern says:

====================
selftests: Updates to fcnal-test for autoamted environment

The first patch updates the PATH for fcnal-test.sh to find the nettest
binary when invoked at the top-level directory via
   make -C tools/testing/selftests TARGETS=net run_tests

Second patch fixes a bug setting the ping_group; it has a compound value
and that value is not traversing the various helper functions in tact.
Fix by creating a helper specific to setting it.

Third patch adds more output when a test fails - e.g., to catch a change
in the return code of some test.

With these 3 patches, the entire suite completes successfully when
run on Ubuntu 23.10 with 6.5 kernel - 914 tests pass, 0 fail.
====================

Link: https://lore.kernel.org/r/20240124214117.24687-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftest: Show expected and actual return codes for test failures in fcnal-test
David Ahern [Wed, 24 Jan 2024 21:41:17 +0000 (14:41 -0700)]
selftest: Show expected and actual return codes for test failures in fcnal-test

Capture expected and actual return codes for a test that fails in
the fcnal-test suite.

Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240124214117.24687-4-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftest: Fix set of ping_group_range in fcnal-test
David Ahern [Wed, 24 Jan 2024 21:41:16 +0000 (14:41 -0700)]
selftest: Fix set of ping_group_range in fcnal-test

ping_group_range sysctl has a compound value which does not go
through the various function layers in tact. Create a helper
function to bypass the layers and correctly set the value.

Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240124214117.24687-3-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftest: Update PATH for nettest in fcnal-test
David Ahern [Wed, 24 Jan 2024 21:41:15 +0000 (14:41 -0700)]
selftest: Update PATH for nettest in fcnal-test

Allow fcnal-test.sh to be run from top level directory in the
kernel repo as well as from tools/testing/selftests/net by
setting the PATH to find the in-tree nettest.

Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240124214117.24687-2-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 26 Jan 2024 00:49:55 +0000 (16:49 -0800)]
Merge tag 'wireless-next-2024-01-25' of git://git./linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The first "new features" pull request for v6.9. We have only driver
changes this time and most of them are for Realtek drivers. Really
nice to see activity in Broadcom drivers again.

Major changes:

rtwl8xxxu
 * RTL8188F: concurrent interface support
 * Channel Switch Announcement (CSA) support in AP mode

brcmfmac
 * per-vendor feature support
 * per-vendor SAE password setup

rtlwifi
 * speed up USB firmware initialisation

* tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits)
  wifi: iwlegacy: Use kcalloc() instead of kzalloc()
  wifi: rtw89: fix disabling concurrent mode TX hang issue
  wifi: rtw89: fix HW scan timeout due to TSF sync issue
  wifi: rtw89: add wait/completion for abort scan
  wifi: rtw89: fix null pointer access when abort scan
  wifi: rtw89: disable RTS when broadcast/multicast
  wifi: rtw89: Set default CQM config if not present
  wifi: rtw89: refine hardware scan C2H events
  wifi: rtw89: refine add_chan H2C command to encode_bits
  wifi: rtw89: 8922a: add BTG functions to assist BT coexistence to control TX/RX
  wifi: rtw89: 8922a: add TX power related ops
  wifi: rtw89: 8922a: add register definitions of H2C, C2H, page, RRSR and EDCCA
  wifi: rtw89: 8922a: add chip_ops related to BB init
  wifi: rtw89: 8922a: add chip_ops::{enable,disable}_bb_rf
  wifi: rtw89: add mlo_dbcc_mode for WiFi 7 chips
  wifi: rtlwifi: Speed up firmware loading for USB
  wifi: rtl8xxxu: add missing number of sec cam entries for all variants
  wifi: brcmfmac: allow per-vendor event handling
  wifi: brcmfmac: avoid invalid list operation when vendor attach fails
  wifi: brcmfmac: Demote vendor-specific attach/detach messages to info
  ...
====================

Link: https://lore.kernel.org/r/20240125104030.B6CA6C433C7@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agovsock/test: print type for SOCK_SEQPACKET
Arseniy Krasnov [Wed, 24 Jan 2024 19:32:55 +0000 (22:32 +0300)]
vsock/test: print type for SOCK_SEQPACKET

SOCK_SEQPACKET is supported for virtio transport, so do not interpret
such type of socket as unknown.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240124193255.3417803-1-avkrasnov@salutedevices.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge branch 'selftests-tc-testing-misc-changes-for-tdc'
Jakub Kicinski [Fri, 26 Jan 2024 00:38:19 +0000 (16:38 -0800)]
Merge branch 'selftests-tc-testing-misc-changes-for-tdc'

Pedro Tammela says:

====================
selftests: tc-testing: misc changes for tdc

Patches 1 and 3 are fixes for tdc that were discovered when running it
using defconfig + tc-testing config and against the latest iproute2.

Patch 2 improves the taprio tests.

Patch 4 enables all tdc tests.

Patch 5 fixes the return code of tdc for when a test fails
setup/teardown.
====================

Link: https://lore.kernel.org/r/20240124181933.75724-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: tc-testing: return fail if a test fails in setup/teardown
Pedro Tammela [Wed, 24 Jan 2024 18:19:33 +0000 (15:19 -0300)]
selftests: tc-testing: return fail if a test fails in setup/teardown

As of today tests throwing exceptions in setup/teardown phase are
treated as skipped but they should really be failures.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-6-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: tc-testing: enable all tdc tests
Pedro Tammela [Wed, 24 Jan 2024 18:19:32 +0000 (15:19 -0300)]
selftests: tc-testing: enable all tdc tests

For the longest time tdc ran only actions and qdiscs tests.
It's time to enable all the remaining tests so every user visible
piece of TC is tested by the downstream CIs.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-5-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: tc-testing: adjust fq test to latest iproute2
Pedro Tammela [Wed, 24 Jan 2024 18:19:31 +0000 (15:19 -0300)]
selftests: tc-testing: adjust fq test to latest iproute2

Adjust the fq verify regex to the latest iproute2

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-4-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: tc-testing: check if 'jq' is available in taprio tests
Pedro Tammela [Wed, 24 Jan 2024 18:19:30 +0000 (15:19 -0300)]
selftests: tc-testing: check if 'jq' is available in taprio tests

If 'jq' is not available the taprio tests might enter an infinite loop,
use the "dependsOn" feature from tdc to check if jq is present. If it's
not the test is skipped.

Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoselftests: tc-testing: add missing netfilter config
Pedro Tammela [Wed, 24 Jan 2024 18:19:29 +0000 (15:19 -0300)]
selftests: tc-testing: add missing netfilter config

On a default config + tc-testing config build, tdc will miss
all the netfilter related tests because it's missing:
   CONFIG_NETFILTER=y

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240124181933.75724-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 25 Jan 2024 22:00:54 +0000 (14:00 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
20 months agoMerge tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 25 Jan 2024 18:58:35 +0000 (10:58 -0800)]
Merge tag 'net-6.8-rc2' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, netfilter and WiFi.

  Jakub is doing a lot of work to include the self-tests in our CI, as a
  result a significant amount of self-tests related fixes is flowing in
  (and will likely continue in the next few weeks).

  Current release - regressions:

   - bpf: fix a kernel crash for the riscv 64 JIT

   - bnxt_en: fix memory leak in bnxt_hwrm_get_rings()

   - revert "net: macsec: use skb_ensure_writable_head_tail to expand
     the skb"

  Previous releases - regressions:

   - core: fix removing a namespace with conflicting altnames

   - tc/flower: fix chain template offload memory leak

   - tcp:
      - make sure init the accept_queue's spinlocks once
      - fix autocork on CPUs with weak memory model

   - udp: fix busy polling

   - mlx5e:
      - fix out-of-bound read in port timestamping
      - fix peer flow lists corruption

   - iwlwifi: fix a memory corruption

  Previous releases - always broken:

   - netfilter:
      - nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress
        basechain
      - nft_limit: reject configurations that cause integer overflow

   - bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a
     NULL pointer dereference upon shrinking

   - llc: make llc_ui_sendmsg() more robust against bonding changes

   - smc: fix illegal rmb_desc access in SMC-D connection dump

   - dpll: fix pin dump crash for rebound module

   - bnxt_en: fix possible crash after creating sw mqprio TCs

   - hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB

  Misc:

   - several self-tests fixes for better integration with the netdev CI

   - added several missing modules descriptions"

* tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
  tsnep: Remove FCS for XDP data path
  net: fec: fix the unhandled context fault from smmu
  selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
  fjes: fix memleaks in fjes_hw_setup
  i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  i40e: set xdp_rxq_info::frag_size
  xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
  ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
  ice: remove redundant xdp_rxq_info registration
  i40e: handle multi-buffer packets that are shrunk by xdp prog
  ice: work on pre-XDP prog frag count
  xsk: fix usage of multi-buffer BPF helpers for ZC XDP
  xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
  xsk: recycle buffer in case Rx queue was full
  net: fill in MODULE_DESCRIPTION()s for rvu_mbox
  net: fill in MODULE_DESCRIPTION()s for litex
  net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio
  net: fill in MODULE_DESCRIPTION()s for fec
  ...

20 months agoMerge tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overla...
Linus Torvalds [Thu, 25 Jan 2024 18:52:30 +0000 (10:52 -0800)]
Merge tag 'ovl-fixes-6.8-rc2' of git://git./linux/kernel/git/overlayfs/vfs

Pull overlayfs fix from Amir Goldstein:
 "Change the on-disk format for the new "xwhiteouts" feature introduced
  in v6.7

  The change reduces unneeded overhead of an extra getxattr per readdir.
  The only user of the "xwhiteout" feature is the external composefs
  tool, which has been updated to support the new on-disk format.

  This change is also designated for 6.7.y"

* tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: mark xwhiteouts directory with overlay.opaque='x'

20 months agoMerge tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Thu, 25 Jan 2024 18:41:29 +0000 (10:41 -0800)]
Merge tag 'vfs-6.8-rc2.netfs' of git://git./linux/kernel/git/vfs/vfs

Pull netfs fixes from Christian Brauner:
 "This contains various fixes for the netfs work merged earlier this
  cycle:

  afs:
   - Fix locking imbalance in afs_proc_addr_prefs_show()
   - Remove afs_dynroot_d_revalidate() which is redundant
   - Fix error handling during lookup
   - Hide sillyrenames from userspace. This fixes a race between
     silly-rename files being created/removed and userspace iterating
     over directory entries
   - Don't use unnecessary folio_*() functions

  cifs:
   - Don't use unnecessary folio_*() functions

  cachefiles:
   - erofs: Fix Null dereference when cachefiles are not doing
     ondemand-mode
   - Update mailing list

  netfs library:
   - Add Jeff Layton as reviewer
   - Update mailing list
   - Fix a error checking in netfs_perform_write()
   - fscache: Check error before dereferencing
   - Don't use unnecessary folio_*() functions"

* tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  afs: Fix missing/incorrect unlocking of RCU read lock
  afs: Remove afs_dynroot_d_revalidate() as it is redundant
  afs: Fix error handling with lookup via FS.InlineBulkStatus
  afs: Hide silly-rename files from userspace
  cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
  netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
  netfs, fscache: Prevent Oops in fscache_put_cache()
  cifs: Don't use certain unnecessary folio_*() functions
  afs: Don't use certain unnecessary folio_*() functions
  netfs: Don't use certain unnecessary folio_*() functions
  netfs: Add Jeff Layton as reviewer
  netfs, cachefiles: Change mailing list

20 months agoMerge tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Thu, 25 Jan 2024 18:26:52 +0000 (10:26 -0800)]
Merge tag 'nfsd-6.8-1' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix in-kernel RPC UDP transport

 - Fix NFSv4.0 RELEASE_LOCKOWNER

* tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix RELEASE_LOCKOWNER
  SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()

20 months agoMerge tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux
Linus Torvalds [Thu, 25 Jan 2024 18:21:21 +0000 (10:21 -0800)]
Merge tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux

Pull RCU fix from Neeraj Upadhyay:
 "This fixes RCU grace period stalls, which are observed when an
  outgoing CPU's quiescent state reporting results in wakeup of one of
  the grace period kthreads, to complete the grace period.

  If those kthreads have SCHED_FIFO policy, the wake up can indirectly
  arm the RT bandwith timer to the local offline CPU.

  Earlier migration of the hrtimers from the CPU introduced in commit
  5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU
  earlier") results in this timer getting ignored.

  If the RCU grace period kthreads are waiting for RT bandwidth to be
  available, they may never be actually scheduled, resulting in RCU
  stall warnings"

* tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux:
  rcu: Defer RCU kthreads wakeup when CPU is dying