linux.git
18 months agodoc/netlink/specs: Add draft nftables spec
Donald Hunter [Thu, 18 Apr 2024 10:47:34 +0000 (11:47 +0100)]
doc/netlink/specs: Add draft nftables spec

Add a spec for nftables that has nearly complete coverage of the ops,
but limited coverage of rule types and subexpressions.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240418104737.77914-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'for-uring-ubufops' into HEAD
Jakub Kicinski [Mon, 22 Apr 2024 23:33:10 +0000 (16:33 -0700)]
Merge branch 'for-uring-ubufops' into HEAD

Pavel Begunkov says:

====================
implement io_uring notification (ubuf_info) stacking (net part)

To have per request buffer notifications each zerocopy io_uring send
request allocates a new ubuf_info. However, as an skb can carry only
one uarg, it may force the stack to create many small skbs hurting
performance in many ways.

The patchset implements notification, i.e. an io_uring's ubuf_info
extension, stacking. It attempts to link ubuf_info's into a list,
allowing to have multiple of them per skb.

liburing/examples/send-zerocopy shows up 6 times performance improvement
for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without
the patchset it requires much larger sends to utilise all potential.

bytes  | before | after (Kqps)
1200   | 195    | 1023
4000   | 193    | 1386
8000   | 154    | 1058
====================

Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: add callback for setting a ubuf_info to skb
Pavel Begunkov [Fri, 19 Apr 2024 11:08:40 +0000 (12:08 +0100)]
net: add callback for setting a ubuf_info to skb

At the moment an skb can only have one ubuf_info associated with it,
which might be a performance problem for zerocopy sends in cases like
TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
way we will implement smarter assignment later like linking ubuf_info
together.

Note, it's an optional callback, which should be compatible with
skb_zcopy_set(), that's because the net stack might potentially decide
to clone an skb and take another reference to ubuf_info whenever it
wishes. Also, a correct implementation should always be able to bind to
an skb without prior ubuf_info, otherwise we could end up in a situation
when the send would not be able to progress.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: extend ubuf_info callback to ops structure
Pavel Begunkov [Fri, 19 Apr 2024 11:08:39 +0000 (12:08 +0100)]
net: extend ubuf_info callback to ops structure

We'll need to associate additional callbacks with ubuf_info, introduce
a structure holding ubuf_info callbacks. Apart from a more smarter
io_uring notification management introduced in next patches, it can be
used to generalise msg_zerocopy_put_abort() and also store
->sg_from_iter, which is currently passed in struct msghdr.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'tcp-avoid-sending-too-small-packets'
Jakub Kicinski [Mon, 22 Apr 2024 21:25:32 +0000 (14:25 -0700)]
Merge branch 'tcp-avoid-sending-too-small-packets'

Eric Dumazet says:

====================
tcp: avoid sending too small packets

tcp_sendmsg() cooks 'large' skbs, that are later split
if needed from tcp_write_xmit().

After a split, the leftover skb size is smaller than the optimal
size, and this causes a performance drop.

In this series, tcp_grow_skb() helper is added to shift
payload from the second skb in the write queue to the first
skb to always send optimal sized skbs.

This increases TSO efficiency, and decreases number of ACK
packets.
====================

Link: https://lore.kernel.org/r/20240418214600.1291486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: try to send bigger TSO packets
Eric Dumazet [Thu, 18 Apr 2024 21:46:00 +0000 (21:46 +0000)]
tcp: try to send bigger TSO packets

While investigating TCP performance, I found that TCP would
sometimes send big skbs followed by a single MSS skb,
in a 'locked' pattern.

For instance, BIG TCP is enabled, MSS is set to have 4096 bytes
of payload per segment. gso_max_size is set to 181000.

This means that an optimal TCP packet size should contain
44 * 4096 = 180224 bytes of payload,

However, I was seeing packets sizes interleaved in this pattern:

172032, 8192, 172032, 8192, 172032, 8192, <repeat>

tcp_tso_should_defer() heuristic is defeated, because after a split of
a packet in write queue for whatever reason (this might be a too small
CWND or a small enough pacing_rate),
the leftover packet in the queue is smaller than the optimal size.

It is time to try to make 'leftover packets' bigger so that
tcp_tso_should_defer() can give its full potential.

After this patch, we can see the following output:

14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980
14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980
14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980
14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980
14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980
14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980
14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980
14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980
14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980
14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980
14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980

With 200 concurrent flows on a 100Gbit NIC, we can see a reduction
of TSO packets (and ACK packets) of about 30 %.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: call tcp_set_skb_tso_segs() from tcp_write_xmit()
Eric Dumazet [Thu, 18 Apr 2024 21:45:59 +0000 (21:45 +0000)]
tcp: call tcp_set_skb_tso_segs() from tcp_write_xmit()

tcp_write_xmit() calls tcp_init_tso_segs()
to set gso_size and gso_segs on the packet.

tcp_init_tso_segs() requires the stack to maintain
an up to date tcp_skb_pcount(), and this makes sense
for packets in rtx queue. Not so much for packets
still in the write queue.

In the following patch, we don't want to deal with
tcp_skb_pcount() when moving payload from 2nd
skb to 1st skb in the write queue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: remove dubious FIN exception from tcp_cwnd_test()
Eric Dumazet [Thu, 18 Apr 2024 21:45:58 +0000 (21:45 +0000)]
tcp: remove dubious FIN exception from tcp_cwnd_test()

tcp_cwnd_test() has a special handing for the last packet in
the write queue if it is smaller than one MSS and has the FIN flag.

This is in violation of TCP RFC, and seems quite dubious.

This packet can be sent only if the current CWND is bigger
than the number of packets in flight.

Making tcp_cwnd_test() result independent of the first skb
in the write queue is needed for the last patch of the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'mlx5e-per-queue-coalescing'
Jakub Kicinski [Mon, 22 Apr 2024 21:22:22 +0000 (14:22 -0700)]
Merge branch 'mlx5e-per-queue-coalescing'

Tariq Toukan says:

====================
mlx5e per-queue coalescing

This patchset adds ethtool per-queue coalescing support for the mlx5e
driver.

The series introduce some changes needed as preparations for the final
patch which adds the support and implements the callbacks.  Main
changes:
- DIM code movements into its own header file.
- Switch to dynamic allocation of the DIM struct in the RQs/SQs.
- Allow coalescing config change without channels reset when possible.
====================

Link: https://lore.kernel.org/r/20240419080445.417574-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5e: Implement ethtool callbacks for supporting per-queue coalescing
Rahul Rameshbabu [Fri, 19 Apr 2024 08:04:45 +0000 (11:04 +0300)]
net/mlx5e: Implement ethtool callbacks for supporting per-queue coalescing

Use mlx5 on-the-fly coalescing configuration support to enable individual
channel configuration.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5e: Support updating coalescing configuration without resetting channels
Rahul Rameshbabu [Fri, 19 Apr 2024 08:04:44 +0000 (11:04 +0300)]
net/mlx5e: Support updating coalescing configuration without resetting channels

When CQE mode or DIM state is changed, gracefully reconfigure channels to
handle new configuration. Previously, would create new channels that would
reflect the changes rather than update the original channels.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5e: Dynamically allocate DIM structure for SQs/RQs
Rahul Rameshbabu [Fri, 19 Apr 2024 08:04:43 +0000 (11:04 +0300)]
net/mlx5e: Dynamically allocate DIM structure for SQs/RQs

Make it possible for the DIM structure to be torn down while an SQ or RQ is
still active. Changing the CQ period mode is an example where the previous
sampling done with the DIM structure would need to be invalidated.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5e: Use DIM constants for CQ period mode parameter
Rahul Rameshbabu [Fri, 19 Apr 2024 08:04:42 +0000 (11:04 +0300)]
net/mlx5e: Use DIM constants for CQ period mode parameter

Use core DIM CQ period mode enum values for the CQ parameter for the period
mode. Translate the value to the specific mlx5 device constant for the
selected period mode when creating a CQ. Avoid needing to translate mlx5
device constants to DIM constants for core DIM functionality.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5e: Move DIM function declarations to en/dim.h
Rahul Rameshbabu [Fri, 19 Apr 2024 08:04:41 +0000 (11:04 +0300)]
net/mlx5e: Move DIM function declarations to en/dim.h

Create a header specifically for DIM-related declarations. Move existing
DIM-specific functionality from en.h. Future DIM-related functionality will
be declared in en/dim.h in subsequent patches.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'net-dsa-vsc73xx-convert-to-phylink-and-do-some-cleanup'
Jakub Kicinski [Mon, 22 Apr 2024 21:20:15 +0000 (14:20 -0700)]
Merge branch 'net-dsa-vsc73xx-convert-to-phylink-and-do-some-cleanup'

Pawel Dembicki says:

====================
net: dsa: vsc73xx: convert to PHYLINK and do some cleanup

This patch series is a result of splitting a larger patch series [0],
where some parts needed to be refactored.

The first patch switches from a poll loop to read_poll_timeout.

The second patch is a simple conversion to phylink because adjust_link
won't work anymore.

The third patch is preparation for future use. Using the
"phy_interface_mode_is_rgmii" macro allows for the proper recognition
of all RGMII modes.

Patches 4-5 involve some cleanup: The fourth patch introduces
a definition with the maximum number of ports to avoid using
magic numbers. The next one fills in documentation.

[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=841034&state=%2A&archive=both
====================

Link: https://lore.kernel.org/r/20240417205048.3542839-1-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: vsc73xx: add structure descriptions
Pawel Dembicki [Wed, 17 Apr 2024 20:50:48 +0000 (22:50 +0200)]
net: dsa: vsc73xx: add structure descriptions

This commit adds updates to the documentation describing the structures
used in vsc73xx. This will help prevent kdoc-related issues in the future.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-6-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: vsc73xx: Add define for max num of ports
Pawel Dembicki [Wed, 17 Apr 2024 20:50:47 +0000 (22:50 +0200)]
net: dsa: vsc73xx: Add define for max num of ports

This patch introduces a new define: VSC73XX_MAX_NUM_PORTS, which can be
used in the future instead of a hardcoded value.

Currently, the only hardcoded value is vsc->ds->num_ports. It is being
replaced with the new define.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-5-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: vsc73xx: use macros for rgmii recognition
Pawel Dembicki [Wed, 17 Apr 2024 20:50:46 +0000 (22:50 +0200)]
net: dsa: vsc73xx: use macros for rgmii recognition

It's preparation for future use. At this moment, the RGMII port is used
only for a connection to the MAC interface, but in the future, someone
could connect a PHY to it. Using the "phy_interface_mode_is_rgmii" macro
allows for the proper recognition of all RGMII modes.

Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240417205048.3542839-4-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: vsc73xx: convert to PHYLINK
Pawel Dembicki [Wed, 17 Apr 2024 20:50:45 +0000 (22:50 +0200)]
net: dsa: vsc73xx: convert to PHYLINK

This patch replaces the adjust_link api with the phylink apis that provide
equivalent functionality.

The remaining functionality from the adjust_link is now covered in the
mac_link_* and mac_config from phylink_mac_ops structure.

Removes:
.adjust_link
Adds phylink_mac_ops structure:
.mac_config
.mac_link_up
.mac_link_down

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-3-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: vsc73xx: use read_poll_timeout instead delay loop
Pawel Dembicki [Wed, 17 Apr 2024 20:50:44 +0000 (22:50 +0200)]
net: dsa: vsc73xx: use read_poll_timeout instead delay loop

Switch the delay loop during the Arbiter empty check from
vsc73xx_adjust_link() to use read_poll_timeout(). Functionally,
one msleep() call is eliminated at the end of the loop in the timeout
case.

As Russell King suggested:

"This [change] avoids the issue that on the last iteration, the code reads
the register, tests it, finds the condition that's being waiting for is
false, _then_ waits and end up printing the error message - that last
wait is rather useless, and as the arbiter state isn't checked after
waiting, it could be that we had success during the last wait."

Suggested-by: Russell King <linux@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Link: https://lore.kernel.org/r/20240417205048.3542839-2-paweldembicki@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: do not export tcp_twsk_purge()
Eric Dumazet [Fri, 19 Apr 2024 07:19:42 +0000 (07:19 +0000)]
tcp: do not export tcp_twsk_purge()

After commit 1eeb50435739 ("tcp/dccp: do not care about
families in inet_twsk_purge()") tcp_twsk_purge() is
no longer potentially called from a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoocteontx2-pf: Add support for offload tc with skbedit mark action
Geetha sowjanya [Sat, 20 Apr 2024 09:35:05 +0000 (15:05 +0530)]
octeontx2-pf: Add support for offload tc with skbedit mark action

Support offloading of skbedit mark action.

For example, to mark with 0x0008, with dest ip 60.60.60.2 on eth2
interface:

 # tc qdisc add dev eth2 ingress
 # tc filter add dev eth2 ingress protocol ip flower \
      dst_ip 60.60.60.2 action skbedit mark 0x0008 skip_sw

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: ethernet: ti: am65-cpsw: Fix xdp_rxq error for disabled port
Julien Panis [Thu, 18 Apr 2024 15:34:55 +0000 (17:34 +0200)]
net: ethernet: ti: am65-cpsw: Fix xdp_rxq error for disabled port

When an ethX port is disabled in the device tree, an error is returned
by xdp_rxq_info_reg() function while transitioning the CPSW device to
the up state. The message 'Missing net_device from driver' is output.

This patch fixes the issue by registering xdp_rxq info only if ethX
port is enabled (i.e. ndev pointer is not NULL).

Fixes: 8acacc40f733 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
Link: https://lore.kernel.org/all/260d258f-87a1-4aac-8883-aab4746b32d8@ti.com/
Reported-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Closes: https://gist.github.com/Siddharth-Vadapalli-at-TI/5ed0e436606001c247a7da664f75edee
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agosysctl: treewide: constify ctl_table_header::ctl_table_arg
Thomas Weißschuh [Thu, 18 Apr 2024 09:40:08 +0000 (11:40 +0200)]
sysctl: treewide: constify ctl_table_header::ctl_table_arg

To be able to constify instances of struct ctl_tables it is necessary to
remove ways through which non-const versions are exposed from the
sysctl core.
One of these is the ctl_table_arg member of struct ctl_table_header.

Constify this reference as a prerequisite for the full constification of
struct ctl_table instances.
No functional change.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge branch 'testing-make-netfilter-selftests-functional-in-vng-environment'
Jakub Kicinski [Sat, 20 Apr 2024 03:10:52 +0000 (20:10 -0700)]
Merge branch 'testing-make-netfilter-selftests-functional-in-vng-environment'

Florian Westphal says:

====================
testing: make netfilter selftests functional in vng environment

This is the second batch of the netfilter selftest move.

Changes since v1:
- makefile and kernel config are updated to have all required features
- fix makefile with missing bits to make kselftest-install work
- test it via vng as per
   https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
   (Thanks Jakub!)
- squash a few fixes, e.g. nft_queue.sh v1 had a race w. NFNETLINK_QUEUE=m
- add a settings file with 8m timeout, for nft_concat_range.sh sake.
  That script can be sped up a bit, I think, but its not contained in
  this batch yet.
- toss the first two bogus rebase artifacts (Matthieu Baerts)

scripts are moved to lib.sh infra. This allows to use busywait helper
and ditch various 'sleep 2' all over the place.

Tested on Fedora 39:

vng --build  --config tools/testing/selftests/net/netfilter/config
make -C tools/testing/selftests/ TARGETS=net/netfilter
vng -v --run . --user root --cpus 2 -- \
        make -C tools/testing/selftests TARGETS=net/netfilter run_tests

... all tests pass except nft_audit.sh which SKIPs due to nft version mismatch
(Fedora is on nft 1.0.7 which lacks reset keyword support).

Missing/WIP bits:
- speed up nf_concat_range.sh test
- extend flowtable selftest
- shellcheck fixups for remaining scripts
====================

Link: https://lore.kernel.org/r/20240418152744.15105-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: update makefiles and kernel config
Florian Westphal [Thu, 18 Apr 2024 15:27:40 +0000 (17:27 +0200)]
selftests: netfilter: update makefiles and kernel config

Jakub reports the Makefile missed a few updates to make kselftest-install
work for the netfilter tests and points out that config file lacks many
dependencies such as VETH support.

The settings file (timeout 8m) is added for nft_concat_range.sh script
which can take several minutes to complete.

Fixes: 3f189349e52a ("selftests: netfilter: move to net subdir")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/all/20240412175413.04e5e616@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-13-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_audit.sh: add more skip checks
Florian Westphal [Thu, 18 Apr 2024 15:27:39 +0000 (17:27 +0200)]
selftests: netfilter: nft_audit.sh: add more skip checks

This testcase doesn't work if auditd is running, audit_logread will not
receive any data in that case.

Add a nftables feature test for the reset keyword and skip this test
if that fails.

While at it, do a few minor shellcheck cleanups.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-12-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_meta.sh: small shellcheck cleanup
Florian Westphal [Thu, 18 Apr 2024 15:27:38 +0000 (17:27 +0200)]
selftests: netfilter: nft_meta.sh: small shellcheck cleanup

shellcheck complains about missing "", so add those.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-11-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_fib.sh: shellcheck cleanups
Florian Westphal [Thu, 18 Apr 2024 15:27:37 +0000 (17:27 +0200)]
selftests: netfilter: nft_fib.sh: shellcheck cleanups

no functional change intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-10-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: conntrack_ipip_mtu.sh: shellcheck cleanups
Florian Westphal [Thu, 18 Apr 2024 15:27:36 +0000 (17:27 +0200)]
selftests: netfilter: conntrack_ipip_mtu.sh: shellcheck cleanups

No functional change intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-9-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_nat_zones.sh: shellcheck cleanups
Florian Westphal [Thu, 18 Apr 2024 15:27:35 +0000 (17:27 +0200)]
selftests: netfilter: nft_nat_zones.sh: shellcheck cleanups

While at it: No need for iperf here, use socat.
This also reduces the script runtime.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-8-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: xt_string.sh: shellcheck cleanups
Florian Westphal [Thu, 18 Apr 2024 15:27:34 +0000 (17:27 +0200)]
selftests: netfilter: xt_string.sh: shellcheck cleanups

no functional change intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-7-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: xt_string.sh: move to lib.sh infra
Florian Westphal [Thu, 18 Apr 2024 15:27:33 +0000 (17:27 +0200)]
selftests: netfilter: xt_string.sh: move to lib.sh infra

Intentional changes:
- Use socat instead of netcat
- Use a temporary file instead of pipe, else packets do not match
  "-m string" rules, multiple writes to the pipe cause multiple packets,
  but this needs only one to work.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-6-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_zones_many.sh: move to lib.sh infra
Florian Westphal [Thu, 18 Apr 2024 15:27:32 +0000 (17:27 +0200)]
selftests: netfilter: nft_zones_many.sh: move to lib.sh infra

Also do shellcheck cleanups here, no functional changes intended.
When running tests via vng tool, the packetpath insertion test fails:
dd: failed to open '/dev/stdout': Device or resource busy

Just omit 'of=' and this will work as intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-5-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_synproxy.sh: move to lib.sh infra
Florian Westphal [Thu, 18 Apr 2024 15:27:31 +0000 (17:27 +0200)]
selftests: netfilter: nft_synproxy.sh: move to lib.sh infra

use checktool helper where applicable.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-4-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_queue.sh: shellcheck cleanups
Florian Westphal [Thu, 18 Apr 2024 15:27:30 +0000 (17:27 +0200)]
selftests: netfilter: nft_queue.sh: shellcheck cleanups

No functional change intended.  Disable frequent shellcheck warnings wrt.
"unreachable" code, those helpers get called indirectly from busywait helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: netfilter: nft_queue.sh: move to lib.sh infra
Florian Westphal [Thu, 18 Apr 2024 15:27:29 +0000 (17:27 +0200)]
selftests: netfilter: nft_queue.sh: move to lib.sh infra

- switch to socat, like other tests
- use buswait helper to test once listener netns is ready
- do not generate multiple input test files, only generate
  one and use cleanup hook to remove it, like other temporary files.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240418152744.15105-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'net-neigh-rcu'
David S. Miller [Fri, 19 Apr 2024 11:39:20 +0000 (12:39 +0100)]
Merge branch 'net-neigh-rcu'

Eric Dumazet says:

====================
neighbour: convert neigh_dump_info() to RCU

Remove RTNL requirement for "ip neighbour show" command.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoneighbour: no longer hold RTNL in neigh_dump_info()
Eric Dumazet [Thu, 18 Apr 2024 09:51:06 +0000 (09:51 +0000)]
neighbour: no longer hold RTNL in neigh_dump_info()

neigh_dump_table() is already relying on RCU protection.

pneigh_dump_table() is using its own protection (tbl->lock)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoneighbour: fix neigh_dump_info() return value
Eric Dumazet [Thu, 18 Apr 2024 09:51:05 +0000 (09:51 +0000)]
neighbour: fix neigh_dump_info() return value

Change neigh_dump_table() and pneigh_dump_table()
to either return 0 or -EMSGSIZE if not enough
space was available in the skb.

Then neigh_dump_info() can do the same.

This allows NLMSG_DONE to be appended to the current
skb at the end of a dump, saving a couple of recvmsg()
system calls.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoneighbour: add RCU protection to neigh_tables[]
Eric Dumazet [Thu, 18 Apr 2024 09:51:04 +0000 (09:51 +0000)]
neighbour: add RCU protection to neigh_tables[]

In order to remove RTNL protection from neightbl_dump_info()
and neigh_dump_info() later, we need to add
RCU protection to neigh_tables[].

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: dsa: xrs700x: fix missing initialisation of ds->phylink_mac_ops
Russell King (Oracle) [Thu, 18 Apr 2024 10:51:21 +0000 (11:51 +0100)]
net: dsa: xrs700x: fix missing initialisation of ds->phylink_mac_ops

The kernel build bot identified the following mistake in the recently
merged 860a9bed2651 ("net: dsa: xrs700x: provide own phylink MAC
operations") patch:

drivers/net/dsa/xrs700x/xrs700x.c:714:37: warning: 'xrs700x_phylink_mac_ops' defined but not used [-Wunused-const-variable=]
     714 | static const struct phylink_mac_ops xrs700x_phylink_mac_ops = {
         |                                     ^~~~~~~~~~~~~~~~~~~~~~~

Fix the omitted assignment of ds->phylink_mac_ops.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge branch 'net-rps-lockless'
David S. Miller [Fri, 19 Apr 2024 10:38:04 +0000 (11:38 +0100)]
Merge branch 'net-rps-lockless'

Jason Xing says:

====================
locklessly protect left members in struct rps_dev_flow

From: Jason Xing <kernelxing@tencent.com>

Since Eric did a more complicated locklessly change to last_qtail
member[1] in struct rps_dev_flow, the left members are easier to change
as the same.

One thing important I would like to share by qooting Eric:
"rflow is located in rxqueue->rps_flow_table, it is thus private to current
thread. Only one cpu can service an RX queue at a time."
So we only pay attention to the reader in the rps_may_expire_flow() and
writer in the set_rps_cpu(). They are in the two different contexts.

[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3b4cf29bdab

v3
Link: https://lore.kernel.org/all/20240417062721.45652-1-kerneljasonxing@gmail.com/
1. adjust the protection in a right way (Eric)

v2
1. fix passing wrong type qtail.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: rps: locklessly access rflow->cpu
Jason Xing [Thu, 18 Apr 2024 07:36:03 +0000 (15:36 +0800)]
net: rps: locklessly access rflow->cpu

This is the last member in struct rps_dev_flow which should be
protected locklessly. So finish it.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: rps: protect filter locklessly
Jason Xing [Thu, 18 Apr 2024 07:36:02 +0000 (15:36 +0800)]
net: rps: protect filter locklessly

As we can see, rflow->filter can be written/read concurrently, so
lockless access is needed.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: rps: protect last_qtail with rps_input_queue_tail_save() helper
Jason Xing [Thu, 18 Apr 2024 07:36:01 +0000 (15:36 +0800)]
net: rps: protect last_qtail with rps_input_queue_tail_save() helper

Removing one unnecessary reader protection and add another writer
protection to finish the locklessly proctection job.

Note: the removed READ_ONCE() is not needed because we only have to protect
the locklessly reader in the different context (rps_may_expire_flow()).

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge branch 'net_sched-dump-no-rtnl'
David S. Miller [Fri, 19 Apr 2024 10:34:08 +0000 (11:34 +0100)]
Merge branch 'net_sched-dump-no-rtnl'

Eric Dumazet says:

====================
net_sched: first series for RTNL-less qdisc dumps

Medium term goal is to implement "tc qdisc show" without needing
to acquire RTNL.

This first series makes the requested changes in 14 qdisc.

Notes :

 - RTNL is still held in "tc qdisc show", more changes are needed.

 - Qdisc returning many attributes might want/need to provide
   a consistent set of attributes. If that is the case, their
   dump() method could acquire the qdisc spinlock, to pair the
   spinlock acquision in their change() method.

V2: Addressed Simon feedback (Thanks a lot Simon)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_skbprio: implement lockless skbprio_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:48 +0000 (07:32 +0000)]
net_sched: sch_skbprio: implement lockless skbprio_dump()

Instead of relying on RTNL, skbprio_dump() can use READ_ONCE()
annotation, paired with WRITE_ONCE() one in skbprio_change().

Also add a READ_ONCE(sch->limit) in skbprio_enqueue().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_pie: implement lockless pie_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:47 +0000 (07:32 +0000)]
net_sched: sch_pie: implement lockless pie_dump()

Instead of relying on RTNL, pie_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in pie_change().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_hhf: implement lockless hhf_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:46 +0000 (07:32 +0000)]
net_sched: sch_hhf: implement lockless hhf_dump()

Instead of relying on RTNL, hhf_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in hhf_change().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_hfsc: implement lockless accesses to q->defcls
Eric Dumazet [Thu, 18 Apr 2024 07:32:45 +0000 (07:32 +0000)]
net_sched: sch_hfsc: implement lockless accesses to q->defcls

Instead of relying on RTNL, hfsc_dump_qdisc() can use READ_ONCE()
annotation, paired with WRITE_ONCE() one in hfsc_change_qdisc().

Use READ_ONCE(q->defcls) in hfsc_classify() to
no longer acquire qdisc lock from hfsc_change_qdisc().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_fq_pie: implement lockless fq_pie_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:44 +0000 (07:32 +0000)]
net_sched: sch_fq_pie: implement lockless fq_pie_dump()

Instead of relying on RTNL, fq_pie_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in fq_pie_change().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_fq_codel: implement lockless fq_codel_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:43 +0000 (07:32 +0000)]
net_sched: sch_fq_codel: implement lockless fq_codel_dump()

Instead of relying on RTNL, fq_codel_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in fq_codel_change().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_fifo: implement lockless __fifo_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:42 +0000 (07:32 +0000)]
net_sched: sch_fifo: implement lockless __fifo_dump()

Instead of relying on RTNL, __fifo_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in __fifo_init().

Also add missing READ_ONCE(sh->limit) in bfifo_enqueue(),
pfifo_enqueue() and pfifo_tail_enqueue().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_ets: implement lockless ets_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:41 +0000 (07:32 +0000)]
net_sched: sch_ets: implement lockless ets_dump()

Instead of relying on RTNL, ets_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in ets_change().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_tfs: implement lockless etf_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:40 +0000 (07:32 +0000)]
net_sched: sch_tfs: implement lockless etf_dump()

Instead of relying on RTNL, codel_dump() can use READ_ONCE()
annotations.

There is no etf_change() yet, this patch imply aligns
this qdisc with others.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_codel: implement lockless codel_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:39 +0000 (07:32 +0000)]
net_sched: sch_codel: implement lockless codel_dump()

Instead of relying on RTNL, codel_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in codel_change().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_choke: implement lockless choke_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:38 +0000 (07:32 +0000)]
net_sched: sch_choke: implement lockless choke_dump()

Instead of relying on RTNL, choke_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in choke_change().

v2: added a WRITE_ONCE(p->Scell_log, Scell_log)
    per Simon feedback in V1
    Removed the READ_ONCE(q->limit) in choke_enqueue()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_cbs: implement lockless cbs_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:37 +0000 (07:32 +0000)]
net_sched: sch_cbs: implement lockless cbs_dump()

Instead of relying on RTNL, cbs_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in cbs_change().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: cake: implement lockless cake_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:36 +0000 (07:32 +0000)]
net_sched: cake: implement lockless cake_dump()

Instead of relying on RTNL, cake_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in cake_change().

v2: addressed Simon feedback in V1: https://lore.kernel.org/netdev/20240417083549.GA3846178@kernel.org/

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet_sched: sch_fq: implement lockless fq_dump()
Eric Dumazet [Thu, 18 Apr 2024 07:32:35 +0000 (07:32 +0000)]
net_sched: sch_fq: implement lockless fq_dump()

Instead of relying on RTNL, fq_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() in fq_change()

v2: Addressed Simon feedback in V1: https://lore.kernel.org/netdev/20240416181915.GT2320920@kernel.org/

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agogve: Remove qpl_cfg struct since qpl_ids map with queues respectively
Ziwei Xiao [Wed, 17 Apr 2024 20:57:57 +0000 (20:57 +0000)]
gve: Remove qpl_cfg struct since qpl_ids map with queues respectively

The qpl_cfg struct was used to make sure that no two different queues
are using QPL with the same qpl_id. We can remove that qpl_cfg struct
since now the qpl_ids map with the queues respectively as follows:
For tx queues: qpl_id = tx_qid
For rx queues: qpl_id = max_tx_queues + rx_qid

And when XDP is used, it will need the user to reduce the tx queues to
be at most half of the max_tx_queues. Then it will use the same number
of tx queues starting from the end of existing tx queues for XDP. So the
XDP queues will not exceed the max_tx_queues range and will not overlap
with the rx queues, where the qpl_ids will not have overlapping too.

Considering of that, we remove the qpl_cfg struct to get the qpl_id
directly based on the queue id. Unless we are erroneously allocating a
rx/tx queue that has already been allocated, we would never allocate
the qpl with the same qpl_id twice. In that case, it should fail much
earlier than the QPL assignment.

Suggested-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20240417205757.778551-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'net: Add support for Power over Ethernet (PoE)'
Jakub Kicinski [Fri, 19 Apr 2024 01:27:40 +0000 (18:27 -0700)]
Merge branch 'net: Add support for Power over Ethernet (PoE)'

Kory Maincent says:

====================
net: Add support for Power over Ethernet (PoE)

This patch series aims at adding support for PoE (Power over Ethernet),
based on the already existing support for PoDL (Power over Data Line)
implementation. In addition, it adds support for two specific PoE
controller, the Microchip PD692x0 and the TI TPS23881.
====================

Link: https://lore.kernel.org/all/20240417-feature_poe-v9-0-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: pse-pd: Add TI TPS23881 PSE controller driver
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:40:02 +0000 (16:40 +0200)]
net: pse-pd: Add TI TPS23881 PSE controller driver

Add a new driver for the TI TPS23881 I2C Power Sourcing Equipment
controller.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-14-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: pse-pd: Add bindings for TPS23881 PSE controller
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:40:01 +0000 (16:40 +0200)]
dt-bindings: net: pse-pd: Add bindings for TPS23881 PSE controller

Add the TPS23881 I2C Power Sourcing Equipment controller device tree
bindings documentation.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-13-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: pse-pd: Add PD692x0 PSE controller driver
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:40:00 +0000 (16:40 +0200)]
net: pse-pd: Add PD692x0 PSE controller driver

Add a new driver for the PD692x0 I2C Power Sourcing Equipment controller.
This driver only support i2c communication for now.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-12-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: pse-pd: Add bindings for PD692x0 PSE controller
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:59 +0000 (16:39 +0200)]
dt-bindings: net: pse-pd: Add bindings for PD692x0 PSE controller

Add the PD692x0 I2C Power Sourcing Equipment controller device tree
bindings documentation.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-11-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: pse-pd: Use regulator framework within PSE framework
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:58 +0000 (16:39 +0200)]
net: pse-pd: Use regulator framework within PSE framework

Integrate the regulator framework to the PSE framework for enhanced
access to features such as voltage, power measurement, and limits, which
are akin to regulators. Additionally, PSE features like port priorities
could potentially enhance the regulator framework. Note that this
integration introduces some implementation complexity, including wrapper
callbacks, but the potential benefits make it worthwhile.

Regulator are using enable counter with specific behavior.
Two calls to regulator_disable will trigger kernel warnings.
If the counter exceeds one, regulator_disable call won't disable the
PSE PI. These behavior isn't suitable for PSE control.
Added a boolean 'enabled' state to prevent multiple calls to
regulator_enable/disable. These calls will only be called from PSE
framework as it won't have any regulator children, therefore no mutex are
needed to safeguards this boolean.

regulator_get needs the consumer device pointer. Use PSE as regulator
provider and consumer device until we have RJ45 ports represented in
the Kernel.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-10-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: pse-pd: Add support for setup_pi_matrix callback
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:57 +0000 (16:39 +0200)]
net: pse-pd: Add support for setup_pi_matrix callback

Implement setup_pi_matrix callback to configure the PSE PI matrix. This
functionality is invoked before registering the PSE and following the core
parsing of the pse_pis devicetree subnode.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-9-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: pse-pd: Add another way of describing several PSE PIs
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:56 +0000 (16:39 +0200)]
dt-bindings: net: pse-pd: Add another way of describing several PSE PIs

PSE PI setup may encompass multiple PSE controllers or auxiliary circuits
that collectively manage power delivery to one Ethernet port.
Such configurations might support a range of PoE standards and require
the capability to dynamically configure power delivery based on the
operational mode (e.g., PoE2 versus PoE4) or specific requirements of
connected devices. In these instances, a dedicated PSE PI node becomes
essential for accurately documenting the system architecture. This node
would serve to detail the interactions between different PSE controllers,
the support for various PoE modes, and any additional logic required to
coordinate power delivery across the network infrastructure.

The old usage of "#pse-cells" is unsuficient as it carries only the PSE PI
index information.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-8-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: pse-pd: Add support for PSE PIs
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:55 +0000 (16:39 +0200)]
net: pse-pd: Add support for PSE PIs

The Power Sourcing Equipment Power Interface (PSE PI) plays a pivotal role
in the architecture of Power over Ethernet (PoE) systems. It is essentially
a blueprint that outlines how one or multiple power sources are connected
to the eight-pin modular jack, commonly known as the Ethernet RJ45 port.
This connection scheme is crucial for enabling the delivery of power
alongside data over Ethernet cables.

This patch adds support for getting the PSE controller node through PSE PI
device subnode.

This supports adds a way to get the PSE PI id from the pse_pi devicetree
subnode of a PSE controller node simply by reading the reg property.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-7-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMAINTAINERS: Add myself to pse networking maintainer
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:54 +0000 (16:39 +0200)]
MAINTAINERS: Add myself to pse networking maintainer

As I add support for PoE in PSE networking subsystem it seems legitimate
to be added to the maintainers.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-6-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonetlink: specs: Expand the pse netlink command with PoE interface
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:53 +0000 (16:39 +0200)]
netlink: specs: Expand the pse netlink command with PoE interface

Add the PoE pse attributes prefix to be able to use PoE interface.

Example usage:
./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-get \
             --json '{"header":{"dev-name":"eth0"}}'
{'header': {'dev-index': 4, 'dev-name': 'eth0'},
 'c33-pse-admin-state': 3,
 'c33-pse-pw-d-status': 4}

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-set \
             --json '{"header":{"dev-name":"eth0"},
     "c33-pse-admin-control":3}'

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-5-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonetlink: specs: Modify pse attribute prefix
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:52 +0000 (16:39 +0200)]
netlink: specs: Modify pse attribute prefix

Remove podl from the attribute prefix to prepare the support of PoE pse
netlink spec.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-4-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: ethtool: pse-pd: Expand pse commands with the PSE PoE interface
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:51 +0000 (16:39 +0200)]
net: ethtool: pse-pd: Expand pse commands with the PSE PoE interface

Add PSE PoE interface support in the ethtool pse command.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-3-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: pse-pd: Introduce PSE types enumeration
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:50 +0000 (16:39 +0200)]
net: pse-pd: Introduce PSE types enumeration

Introduce an enumeration to define PSE types (C33 or PoDL),
utilizing a bitfield for potential future support of both types.
Include 'pse_get_types' helper for external access to PSE type info.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-2-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoethtool: Expand Ethernet Power Equipment with c33 (PoE) alongside PoDL
Kory Maincent (Dent Project) [Wed, 17 Apr 2024 14:39:49 +0000 (16:39 +0200)]
ethtool: Expand Ethernet Power Equipment with c33 (PoE) alongside PoDL

In the current PSE interface for Ethernet Power Equipment, support is
limited to PoDL. This patch extends the interface to accommodate the
objects specified in IEEE 802.3-2022 145.2 for Power sourcing
Equipment (PSE).

The following objects are now supported and considered mandatory:
- IEEE 802.3-2022 30.9.1.1.5 aPSEPowerDetectionStatus
- IEEE 802.3-2022 30.9.1.1.2 aPSEAdminState
- IEEE 802.3-2022 30.9.1.2.1 aPSEAdminControl

To avoid confusion between "PoDL PSE" and "PoE PSE", which have similar
names but distinct values, we have followed the suggestion of Oleksij
Rempel and Andrew Lunn to maintain separate naming schemes for each,
using c33 (clause 33) prefix for "PoE PSE".
You can find more details in the discussion threads here:
https://lore.kernel.org/netdev/20230912110637.GI780075@pengutronix.de/
https://lore.kernel.org/netdev/2539b109-72ad-470a-9dae-9f53de4f64ec@lunn.ch/

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-1-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Fri, 19 Apr 2024 00:10:20 +0000 (17:10 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-04-17 (ice)

This series contains updates to ice driver only.

Marcin adds Tx malicious driver detection (MDD) events to be included as
part of mdd-auto-reset-vf.

Dariusz removes unnecessary implementation of ndo_get_phys_port_name.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Remove ndo_get_phys_port_name
  ice: Add automatic VF reset on Tx MDD events
====================

Link: https://lore.kernel.org/r/20240417165634.2081793-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: sja1105: flower: validate control flags
Asbjørn Sloth Tønnesen [Wed, 17 Apr 2024 14:44:12 +0000 (14:44 +0000)]
net: dsa: sja1105: flower: validate control flags

This driver currently doesn't support any control flags.

Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://lore.kernel.org/r/20240417144413.104257-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: felix: flower: validate control flags
Asbjørn Sloth Tønnesen [Wed, 17 Apr 2024 14:44:06 +0000 (14:44 +0000)]
net: dsa: felix: flower: validate control flags

This driver currently doesn't support any control flags.

Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://lore.kernel.org/r/20240417144407.104241-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: mscc: ocelot: flower: validate control flags
Asbjørn Sloth Tønnesen [Wed, 17 Apr 2024 14:43:58 +0000 (14:43 +0000)]
net: mscc: ocelot: flower: validate control flags

This driver currently doesn't support any control flags.

Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://lore.kernel.org/r/20240417144359.104225-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agosfc: use flow_rule_is_supp_control_flags()
Asbjørn Sloth Tønnesen [Wed, 17 Apr 2024 14:07:10 +0000 (14:07 +0000)]
sfc: use flow_rule_is_supp_control_flags()

Change the check for unsupported control flags, to use the new helper
flow_rule_is_supp_control_flags().

Since the helper was based on sfc, then nothing really changes.

Compile-tested, and compiled objects are identical.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20240417140712.100905-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agomlxsw: spectrum_flower: validate control flags
Asbjørn Sloth Tønnesen [Wed, 17 Apr 2024 13:51:20 +0000 (13:51 +0000)]
mlxsw: spectrum_flower: validate control flags

This driver currently doesn't support any control flags.

Use flow_rule_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240417135131.99921-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: net: set the exit code correctly in Python tests
Jakub Kicinski [Wed, 17 Apr 2024 23:11:40 +0000 (16:11 -0700)]
selftests: net: set the exit code correctly in Python tests

Test cases need to exit with non-zero status if they failed,
we currently don't do that:

  # KTAP version 1
  # 1..3
  # # At /root/ksft-net-drv/drivers/net/./ping.py line 18:
  # # Check failed 1 != 2
  # not ok 1 ping.test_v4
  # ok 2 ping.test_v6
  # ok 3 ping.test_tcp
  # # Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0
  ok 1 selftests: drivers/net: ping.py
  ^^^^

It's a bit tempting to make the exit part of ksft_run(),
but that only works well for very trivial setups. We can
revisit this later, if people forget to call ksft_exit().

Link: https://lore.kernel.org/r/20240417231146.2435572-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: net: fix counting totals when some checks fail
Jakub Kicinski [Wed, 17 Apr 2024 23:11:39 +0000 (16:11 -0700)]
selftests: net: fix counting totals when some checks fail

Totals currently only pay attention to exceptions, if check fails
(say ksft_eq()) the test case will be counted as pass:

  # At /ksft/drivers/net/./ping.py line 18:
  # Check failed 1 != 2
  not ok 1 ping.test_v4
  ok 2 ping.test_v6
  ok 3 ping.test_tcp
  # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
            ^^^^^^^^^^^^^

Pay attention to the result.

Fixes: b86761ff6374 ("selftests: net: add scaffolding for Netlink tests in Python")
Link: https://lore.kernel.org/r/20240417231146.2435572-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 18 Apr 2024 20:10:20 +0000 (13:10 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

include/trace/events/rpcgss.h
  386f4a737964 ("trace: events: cleanup deprecated strncpy uses")
  a4833e3abae1 ("SUNRPC: Fix rpcgss_context trace event acceptor field")

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_tc_lib.c
  2cca35f5dd78 ("ice: Fix checking for unsupported keys on non-tunnel device")
  784feaa65dfd ("ice: Add support for PFCP hardware offload in switchdev")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'net-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 18 Apr 2024 18:40:54 +0000 (11:40 -0700)]
Merge tag 'net-6.9-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "A little calmer than usual, probably just the timing of sub-tree PRs.

  Including fixes from netfilter.

  Current release - regressions:

   - inet: bring NLM_DONE out to a separate recv() again, fix user space
     which assumes multiple recv()s will happen and gets blocked forever

   - drv: mlx5:
       - restore mistakenly dropped parts in register devlink flow
       - use channel mdev reference instead of global mdev instance for
         coalescing
       - acquire RTNL lock before RQs/SQs activation/deactivation

  Previous releases - regressions:

   - net: change maximum number of UDP segments to 128, fix virtio
     compatibility with Windows peers

   - usb: ax88179_178a: avoid writing the mac address before first
     reading

  Previous releases - always broken:

   - sched: fix mirred deadlock on device recursion

   - netfilter:
       - br_netfilter: skip conntrack input hook for promisc packets
       - fixes removal of duplicate elements in the pipapo set backend
       - various fixes for abort paths and error handling

   - af_unix: don't peek OOB data without MSG_OOB

   - drv: flower: fix fragment flags handling in multiple drivers

   - drv: ravb: fix jumbo frames and packet stats accounting

  Misc:

   - kselftest_harness: fix Clang warning about zero-length format

   - tun: limit printing rate when illegal packet received by tun dev"

* tag 'net-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them
  net: usb: ax88179_178a: avoid writing the mac address before first reading
  net: ravb: Fix RX byte accounting for jumbo packets
  net: ravb: Fix GbEth jumbo packet RX checksum handling
  net: ravb: Allow RX loop to move past DMA mapping errors
  net: ravb: Count packets instead of descriptors in R-Car RX path
  net: ethernet: mtk_eth_soc: fix WED + wifi reset
  net:usb:qmi_wwan: support Rolling modules
  selftests: kselftest_harness: fix Clang warning about zero-length format
  net/sched: Fix mirred deadlock on device recursion
  netfilter: nf_tables: fix memleak in map from abort path
  netfilter: nf_tables: restore set elements when delete set fails
  netfilter: nf_tables: missing iterator type in lookup walk
  s390/ism: Properly fix receive message buffer allocation
  net: dsa: mt7530: fix port mirroring for MT7988 SoC switch
  net: dsa: mt7530: fix mirroring frames received on local port
  tun: limit printing rate when illegal packet received by tun dev
  ice: Fix checking for unsupported keys on non-tunnel device
  ice: tc: allow zero flags in parsing tc flower
  ice: tc: check src_vsi in case of traffic from VF
  ...

18 months agoMerge tag 'gpio-fixes-for-v6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Apr 2024 17:18:03 +0000 (10:18 -0700)]
Merge tag 'gpio-fixes-for-v6.9-rc5' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - use -ENOTSUPP consistently in Intel GPIO drivers

 - don't include dt-bindings headers in gpio-swnode code

 - add missing of device table to gpio-lpc32xx and fix autoloading

* tag 'gpio-fixes-for-v6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: swnode: Remove wrong header inclusion
  gpio: lpc32xx: fix module autoloading
  gpio: crystalcove: Use -ENOTSUPP consistently
  gpio: wcove: Use -ENOTSUPP consistently

18 months agonet: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them
Siddharth Vadapalli [Wed, 17 Apr 2024 09:54:25 +0000 (15:24 +0530)]
net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them

The TX and RX DMA Channels used by the driver to exchange data with CPSW
are not guaranteed to be in a clean state during driver initialization.
The Bootloader could have used the same DMA Channels without cleaning them
up in the event of failure. Thus, reset and disable the DMA Channels to
ensure that they are in a clean state before using them.

Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Reported-by: Schuyler Patton <spatton@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20240417095425.2253876-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: usb: ax88179_178a: avoid writing the mac address before first reading
Jose Ignacio Tornos Martinez [Wed, 17 Apr 2024 08:55:13 +0000 (10:55 +0200)]
net: usb: ax88179_178a: avoid writing the mac address before first reading

After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two
consecutive device resets"), reset operation, in which the default mac
address from the device is read, is not executed from bind operation and
the random address, that is pregenerated just in case, is direclty written
the first time in the device, so the default one from the device is not
even read. This writing is not dangerous because is volatile and the
default mac address is not missed.

In order to avoid this and keep the simplification to have only one
reset and reduce the delays, restore the reset from bind operation and
remove the reset that is commanded from open operation. The behavior is
the same but everything is ready for usbnet_probe.

Tested with ASIX AX88179 USB Gigabit Ethernet devices.
Restore the old behavior for the rest of possible devices because I don't
have the hardware to test.

cc: stable@vger.kernel.org # 6.6+
Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets")
Reported-by: Jarkko Palviainen <jarkko.palviainen@gmail.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Link: https://lore.kernel.org/r/20240417085524.219532-1-jtornosm@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'random-6.9-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Apr 2024 16:49:08 +0000 (09:49 -0700)]
Merge tag 'random-6.9-rc5-for-linus' of git://git./linux/kernel/git/crng/random

Pull random number generator fixes from Jason Donenfeld:

 - The input subsystem contributes entropy in some places where a
   spinlock is held, but the entropy accounting code only handled
   callers being in an interrupt or non-atomic process context, but not
   atomic process context. We fix this by removing an optimization and
   just calling queue_work() unconditionally.

 - Greg accidently sent up a patch not intended for his tree and that
   had been nack'd, so that's now reverted.

* tag 'random-6.9-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  Revert "vmgenid: emit uevent when VMGENID updates"
  random: handle creditable entropy from atomic process context

18 months agoMerge tag 'platform-drivers-x86-v6.9-3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Apr 2024 14:15:33 +0000 (07:15 -0700)]
Merge tag 'platform-drivers-x86-v6.9-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

 - amd/pmf: Add SPS notifications quirk (+ quirk support)

 - amd/pmf: Lower Smart PC check message severity

 - x86/ISST: New HW support

 - x86/intel-uncore-freq: Bump minor version to avoid "unsupported" message

 - amd/pmc: New BIOS version still needs Spurious IRQ1 quirk

* tag 'platform-drivers-x86-v6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/amd/pmc: Extend Framework 13 quirk to more BIOSes
  platform/x86/intel-uncore-freq: Increase minor number support
  platform/x86: ISST: Add Granite Rapids-D to HPM CPU list
  platform/x86/amd: pmf: Add quirk for ROG Zephyrus G14
  platform/x86/amd: pmf: Add infrastructure for quirking supported funcs
  platform/x86/amd: pmf: Decrease error message to debug

18 months agoRevert "vmgenid: emit uevent when VMGENID updates"
Jason A. Donenfeld [Thu, 18 Apr 2024 11:45:17 +0000 (13:45 +0200)]
Revert "vmgenid: emit uevent when VMGENID updates"

This reverts commit ad6bcdad2b6724e113f191a12f859a9e8456b26d. I had
nak'd it, and Greg said on the thread that it links that he wasn't going
to take it either, especially since it's not his code or his tree, but
then, seemingly accidentally, it got pushed up some months later, in
what looks like a mistake, with no further discussion in the linked
thread. So revert it, since it's clearly not intended.

Fixes: ad6bcdad2b67 ("vmgenid: emit uevent when VMGENID updates")
Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230531095119.11202-2-bchalios@amazon.es
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
18 months agovirtio_net: Support RX hash XDP hint
Liang Chen [Wed, 17 Apr 2024 07:18:22 +0000 (15:18 +0800)]
virtio_net: Support RX hash XDP hint

The RSS hash report is a feature that's part of the virtio specification.
Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost
(still a work in progress as per [1]) support this feature. While the
capability to obtain the RSS hash has been enabled in the normal path,
it's currently missing in the XDP path. Therefore, we are introducing
XDP hints through kfuncs to allow XDP programs to access the RSS hash.

1.
https://lore.kernel.org/all/20231015141644.260646-1-akihiko.odaki@daynix.com/#r

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240417071822.27831-1-liangchen.linux@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge tag 'nf-24-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 18 Apr 2024 11:12:36 +0000 (13:12 +0200)]
Merge tag 'nf-24-04-18' of git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 amends a missing spot where the set iterator type is unset.
 This is fixing a issue in the previous pull request.

Patch #2 fixes the delete set command abort path by restoring state
         of the elements. Reverse logic for the activate (abort) case
 otherwise element state is not restored, this requires to move
 the check for active/inactive elements to the set iterator
 callback. From the deactivate path, toggle the next generation
 bit and from the activate (abort) path, clear the next generation
 bitmask.

Patch #3 skips elements already restored by delete set command from the
 abort path in case there is a previous delete element command in
 the batch. Check for the next generation bit just like it is done
 via set iteration to restore maps.

netfilter pull request 24-04-18

* tag 'nf-24-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix memleak in map from abort path
  netfilter: nf_tables: restore set elements when delete set fails
  netfilter: nf_tables: missing iterator type in lookup walk
====================

Link: https://lore.kernel.org/r/20240418010948.3332346-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge branch 'net-ipa-header-hygiene'
Paolo Abeni [Thu, 18 Apr 2024 11:01:07 +0000 (13:01 +0200)]
Merge branch 'net-ipa-header-hygiene'

Alex Elder says:

====================
net: ipa: header hygiene

The end result of this series is that the list of files included in
every IPA source file will be maintained in sorted order.  This
imposes some consistency that was previously not possible.

If an IPA header file requires a symbol or type declared in another
header, that other header must be included.  E.g., if bool or u32
type is used in a function declaration in an IPA header file, the
IPA header must include <linux/types.h>.

If a type used is just a struct or union *pointer* or enum type (and
no members within these types are needed), then these types only need
to be *declared* within the header that uses it.

This is sufficient, but in addition, this series removes includes of
files that aren't necessary, as well as unneeded type declarations.
====================

Link: https://lore.kernel.org/r/20240416231018.389520-1-elder@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: ipa: sort all includes
Alex Elder [Tue, 16 Apr 2024 23:10:18 +0000 (18:10 -0500)]
net: ipa: sort all includes

Establish the rule that header files are always included in sorted
(POSIX local) order.  Standard and private headers are separated by
a blank line.

Similarly, sort all forward-declarations for structures.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: ipa: more include file cleanup
Alex Elder [Tue, 16 Apr 2024 23:10:17 +0000 (18:10 -0500)]
net: ipa: more include file cleanup

All of the config data files and all of the register definition
files (plus a few others) use GSI_EE_AP, which is defined in
"ipa_version.h".  Include that header where it's needed.

All of the IPA register definition files include "../ipa.h", though
none of them need anything defined there.  Similarly, all of the GSI
register definition files include "../gsi.h", but don't need anything
defined there.  Remove these unnneded includes.

All of the configuration data files include "../gsi.h", though none
of them need anything defined there, so remove these includes.

Remove other includes of local header files that are not required.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: ipa: eliminate unneeded struct declarations
Alex Elder [Tue, 16 Apr 2024 23:10:16 +0000 (18:10 -0500)]
net: ipa: eliminate unneeded struct declarations

As definitions in headers have been moved around, some of the
struct and enum declarations found in header files have become
no longer necessary and can be removed.  Remove these unneeded
declarations.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: ipa: add some needed struct declarations
Alex Elder [Tue, 16 Apr 2024 23:10:15 +0000 (18:10 -0500)]
net: ipa: add some needed struct declarations

Declare some structure types in a few header files where functions
declared therein use them:
  - Functions are declared in "gsi_private.h" that use gsi, gsi_ring, and
    gsi_trans structure pointers.
  - A gsi_trans struct pointer is passed to two functions
    declared in "ipa_endpoint.h"
  - In "ipa_interrupt.h", a platform_device pointer is passed in the
    declaration for ipa_interrupt_init().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>