Lukasz Majewski [Tue, 23 Apr 2024 12:49:04 +0000 (14:49 +0200)]
net: hsr: Provide RedBox support (HSR-SAN)
Introduce RedBox support (HSR-SAN to be more precise) for HSR networks.
Following traffic reduction optimizations have been implemented:
- Do not send HSR supervisory frames to Port C (interlink)
- Do not forward to HSR ring frames addressed to Port C
- Do not forward to Port C frames from HSR ring
- Do not send duplicate HSR frame to HSR ring when destination is Port C
The corresponding patch to modify iptable2 sources has already been sent:
https://lore.kernel.org/netdev/
20240308145729.490863-1-lukma@denx.de/T/
Testing procedure (veth and netns):
-----------------------------------
One shall run:
linux-vanila/tools/testing/selftests/net/hsr/hsr_redbox.sh
(Detailed description of the setup one can find in the test
script file).
Testing procedure (real hardware):
----------------------------------
The EVB-KSZ9477 has been used for testing on net-next branch
(SHA1:
5fc68320c1fb3c7d456ddcae0b4757326a043e6f).
Ports 4/5 were used for SW managed HSR (hsr1) as first hsr0 for ports 1/2
(with HW offloading for ksz9477) was created. Port 3 has been used as
interlink port (single USB-ETH dongle).
Configuration - RedBox (EVB-KSZ9477):
if link set lan1 down;ip link set lan2 down
ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 supervision 45 version 1
ip link add name hsr1 type hsr slave1 lan4 slave2 lan5 interlink lan3 supervision 45 version 1
ip link set lan4 up;ip link set lan5 up
ip link set lan3 up
ip addr add 192.168.0.11/24 dev hsr1
ip link set hsr1 up
Configuration - DAN-H (EVB-KSZ9477):
ip link set lan1 down;ip link set lan2 down
ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 supervision 45 version 1
ip link add name hsr1 type hsr slave1 lan4 slave2 lan5 supervision 45 version 1
ip link set lan4 up;ip link set lan5 up
ip addr add 192.168.0.12/24 dev hsr1
ip link set hsr1 up
This approach uses only SW based HSR devices (hsr1).
-------------- ----------------- ------------
DAN-H Port5 | <------> | Port5 | |
Port4 | <------> | Port4 Port3 | <---> | PC
| | (RedBox) | | (USB-ETH)
EVB-KSZ9477 | | EVB-KSZ9477 | |
-------------- ----------------- ------------
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Davide Caratti [Thu, 18 Apr 2024 13:50:11 +0000 (15:50 +0200)]
net/sched: fix false lockdep warning on qdisc root lock
Xiumei and Christoph reported the following lockdep splat, complaining of
the qdisc root lock being taken twice:
============================================
WARNING: possible recursive locking detected
6.7.0-rc3+ #598 Not tainted
--------------------------------------------
swapper/2/0 is trying to acquire lock:
ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70
but task is already holding lock:
ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&sch->q.lock);
lock(&sch->q.lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
5 locks held by swapper/2/0:
#0:
ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510
#1:
ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0
#2:
ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70
#3:
ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70
#4:
ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70
stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+
9de0a3cc 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x4a/0x80
__lock_acquire+0xfdd/0x3150
lock_acquire+0x1ca/0x540
_raw_spin_lock+0x34/0x80
__dev_queue_xmit+0x1560/0x2e70
tcf_mirred_act+0x82e/0x1260 [act_mirred]
tcf_action_exec+0x161/0x480
tcf_classify+0x689/0x1170
prio_enqueue+0x316/0x660 [sch_prio]
dev_qdisc_enqueue+0x46/0x220
__dev_queue_xmit+0x1615/0x2e70
ip_finish_output2+0x1218/0x1ed0
__ip_finish_output+0x8b3/0x1350
ip_output+0x163/0x4e0
igmp_ifc_timer_expire+0x44b/0x930
call_timer_fn+0x1a2/0x510
run_timer_softirq+0x54d/0x11a0
__do_softirq+0x1b3/0x88f
irq_exit_rcu+0x18f/0x1e0
sysvec_apic_timer_interrupt+0x6f/0x90
</IRQ>
This happens when TC does a mirred egress redirect from the root qdisc of
device A to the root qdisc of device B. As long as these two locks aren't
protecting the same qdisc, they can be acquired in chain: add a per-qdisc
lockdep key to silence false warnings.
This dynamic key should safely replace the static key we have in sch_htb:
it was added to allow enqueueing to the device "direct qdisc" while still
holding the qdisc root lock.
v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet)
CC: Maxim Mikityanskiy <maxim@isovalent.com>
CC: Xiumei Mu <xmu@redhat.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/451
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 26 Apr 2024 03:00:54 +0000 (20:00 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
net: intel: start The Great Code Dedup + Page Pool for iavf
Alexander Lobakin says:
Here's a two-shot: introduce {,Intel} Ethernet common library (libeth and
libie) and switch iavf to Page Pool. Details are in the commit messages;
here's a summary:
Not a secret there's a ton of code duplication between two and more Intel
ethernet modules. Before introducing new changes, which would need to be
copied over again, start decoupling the already existing duplicate
functionality into a new module, which will be shared between several
Intel Ethernet drivers. The first name that came to my mind was
"libie" -- "Intel Ethernet common library". Also this sounds like
"lovelie" (-> one word, no "lib I E" pls) and can be expanded as
"lib Internet Explorer" :P
The "generic", pure-software part is placed separately, so that it can be
easily reused in any driver by any vendor without linking to the Intel
pre-200G guts. In a few words, it's something any modern driver does the
same way, but nobody moved it level up (yet).
The series is only the beginning. From now on, adding every new feature
or doing any good driver refactoring will remove much more lines than add
for quite some time. There's a basic roadmap with some deduplications
planned already, not speaking of that touching every line now asks:
"can I share this?". The final destination is very ambitious: have only
one unified driver for at least i40e, ice, iavf, and idpf with a struct
ops for each generation. That's never gonna happen, right? But you still
can at least try.
PP conversion for iavf lands within the same series as these two are tied
closely. libie will support Page Pool model only, so that a driver can't
use much of the lib until it's converted. iavf is only the example, the
rest will eventually be converted soon on a per-driver basis. That is
when it gets really interesting. Stay tech.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
MAINTAINERS: add entry for libeth and libie
iavf: switch to Page Pool
iavf: pack iavf_ring more efficiently
libeth: add Rx buffer management
page_pool: add DMA-sync-for-CPU inline helper
page_pool: constify some read-only function arguments
slab: introduce kvmalloc_array_node() and kvcalloc_node()
iavf: drop page splitting and recycling
iavf: kill "legacy-rx" for good
net: intel: introduce {, Intel} Ethernet common library
====================
Link: https://lore.kernel.org/r/20240424203559.3420468-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Apr 2024 02:36:39 +0000 (19:36 -0700)]
Merge branch 'net-lan966x-flower-validate-control-flags'
Asbjørn Sloth Tønnesen says:
====================
net: lan966x: flower: validate control flags
This series adds flower control flags validation to the
lan966x driver, and changes it from assuming that it handles
all control flags, to instead reject rules if they have
masked any unknown/unsupported control flags.
v1: https://lore.kernel.org/netdev/
20240423102720.228728-1-ast@fiberby.net/
====================
Link: https://lore.kernel.org/r/20240424125347.461995-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Wed, 24 Apr 2024 12:53:40 +0000 (12:53 +0000)]
net: lan966x: flower: check for unsupported control flags
Use flow_rule_is_supp_control_flags() to reject filters with
unsupported control flags.
In case any unsupported control flags are masked,
flow_rule_is_supp_control_flags() sets a NL extended
error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20240424125347.461995-4-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Wed, 24 Apr 2024 12:53:39 +0000 (12:53 +0000)]
net: lan966x: flower: rename goto in lan966x_tc_flower_handler_control_usage()
Rename goto label, as the error message is specific to the fragment flags.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20240424125347.461995-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Wed, 24 Apr 2024 12:53:38 +0000 (12:53 +0000)]
net: lan966x: flower: add extack to lan966x_tc_flower_handler_control_usage()
Define extack locally, to reduce line lengths and aid future users.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20240424125347.461995-2-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Apr 2024 02:35:10 +0000 (19:35 -0700)]
Merge branch 'net-sparx5-flower-validate-control-flags'
Asbjørn Sloth Tønnesen says:
====================
net: sparx5: flower: validate control flags
This series adds flower control flags validation to the
sparx5 driver, and changes it from assuming that it handles
all control flags, to instead reject rules if they have
masked any unknown/unsupported control flags.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Tested-by: Daniel Machon <daniel.machon@microchip.com>
v1: https://lore.kernel.org/netdev/
20240423102728.228765-1-ast@fiberby.net/
====================
Link: https://lore.kernel.org/r/20240424121632.459022-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Wed, 24 Apr 2024 12:16:25 +0000 (12:16 +0000)]
net: sparx5: flower: check for unsupported control flags
Use flow_rule_is_supp_control_flags() to reject filters with
unsupported control flags.
In case any unsupported control flags are masked,
flow_rule_is_supp_control_flags() sets a NL extended
error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Tested-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20240424121632.459022-5-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Wed, 24 Apr 2024 12:16:24 +0000 (12:16 +0000)]
net: sparx5: flower: remove goto in sparx5_tc_flower_handler_control_usage()
Remove goto, as it's only used once, and the error message is
specific to that context.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Tested-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20240424121632.459022-4-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Wed, 24 Apr 2024 12:16:23 +0000 (12:16 +0000)]
net: sparx5: flower: add extack to sparx5_tc_flower_handler_control_usage()
Define extack locally, to reduce line lengths and aid future users.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Tested-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20240424121632.459022-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Wed, 24 Apr 2024 12:16:22 +0000 (12:16 +0000)]
net: sparx5: flower: only do lookup if fragment flags are set
The fragment lookup should only be performed, when
at least one of the fragment flags are set.
This change was deliberately not included in commit
68aba00483c7 ("net: sparx5: flower: fix fragment flags handling")
as it's only needed for future proffing the code, since
"mask" is currently only set in conjunction with the
fragment flags.
(The 3rd flag FLOW_DIS_ENCAPSULATION is only used with "key")
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Tested-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20240424121632.459022-2-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao [Wed, 24 Apr 2024 16:11:07 +0000 (09:11 -0700)]
net: wwan: t7xx: Un-embed dummy device
Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].
Un-embed the net_device from the private struct by converting it
into a pointer. Then use the leverage the new alloc_netdev_dummy()
helper to allocate and initialize dummy devices.
[1] https://lore.kernel.org/all/
20240229225910.
79e224cf@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240424161108.3397057-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Apr 2024 02:13:28 +0000 (19:13 -0700)]
Merge branch 'net-microchip-correct-spelling-in-comments'
Simon Horman says:
====================
net: microchip: Correct spelling in comments
Correct spelling in comments in Microchip drivers.
Flagged by codespell.
v1: https://lore.kernel.org/r/
20240419-lan743x-confirm-v1-0-
2a087617a3e5@kernel.org
====================
Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-0-f0480542e39f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 24 Apr 2024 15:13:26 +0000 (16:13 +0100)]
net: sparx5: Correct spelling in comments
Correct spelling in comments, as flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-4-f0480542e39f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 24 Apr 2024 15:13:25 +0000 (16:13 +0100)]
net: encx24j600: Correct spelling in comments
Correct spelling in comments, as flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-3-f0480542e39f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 24 Apr 2024 15:13:24 +0000 (16:13 +0100)]
net: lan966x: Correct spelling in comments
Correct spelling in comments, as flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-2-f0480542e39f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 24 Apr 2024 15:13:23 +0000 (16:13 +0100)]
net: lan743x: Correct spelling in comments
Correct spelling in comments, as flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-1-f0480542e39f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hayes Wang [Wed, 24 Apr 2024 08:45:32 +0000 (16:45 +0800)]
r8152: replace dev_info with dev_dbg for loading firmware
Someone complains the message appears continuously. This occurs
because the device is woken from UPS mode, and the driver re-loads
the firmware.
When the device enters runtime suspend and cable is unplugged, the
device would enter UPS mode. If the runtime resume occurs, and the
device is woken from UPS mode, the driver has to re-load the firmware
and causes the message. If someone wakes the device continuously, the
message would be shown continuously, too. Use dev_dbg to avoid it.
Note that, the function could be called before register_netdev(), so I
don't use netif_info() or netif_dbg().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20240424084532.159649-1-hayeswang@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ma Ke [Wed, 24 Apr 2024 06:56:34 +0000 (14:56 +0800)]
net: usb: ax88179_178a: Add check for usbnet_get_endpoints()
To avoid the failure of usbnet_get_endpoints(), we should check the
return value of the usbnet_get_endpoints().
Signed-off-by: Ma Ke <make_ruc2021@163.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240424065634.1870027-1-make_ruc2021@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Tue, 23 Apr 2024 09:00:25 +0000 (11:00 +0200)]
net: sfp: add quirk for ATS SFP-GE-T 1000Base-TX module
Add quirk for ATS SFP-GE-T 1000Base-TX module.
This copper module comes with broken TX_FAULT indicator which must be
ignored for it to work.
Co-authored-by: Josef Schlehofer <pepe.schlehofer@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
[ rebased on top of net-next ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Link: https://lore.kernel.org/r/20240423090025.29231-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Behún [Tue, 23 Apr 2024 08:50:39 +0000 (10:50 +0200)]
net: sfp: enhance quirk for Fibrestore 2.5G copper SFP module
Enhance the quirk for Fibrestore 2.5G copper SFP module. The original
commit
e27aca3760c0 ("net: sfp: add quirk for FS's 2.5G copper SFP")
introducing the quirk says that the PHY is inaccessible, but that is
not true.
The module uses Rollball protocol to talk to the PHY, and needs a 4
second wait before probing it, same as FS 10G module.
The PHY inside the module is Realtek RTL8221B-VB-CG PHY. The realtek
driver recently gained support to set it up via clause 45 accesses.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240423085039.26957-2-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Behún [Tue, 23 Apr 2024 08:50:38 +0000 (10:50 +0200)]
net: sfp: update comment for FS SFP-10G-T quirk
Update the comment for the Fibrestore SFP-10G-T module: since commit
e9301af385e7 ("net: sfp: fix PHY discovery for FS SFP-10G-T module")
we also do a 4 second wait before probing the PHY.
Fixes: e9301af385e7 ("net: sfp: fix PHY discovery for FS SFP-10G-T module")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240423085039.26957-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 23 Apr 2024 20:54:08 +0000 (20:54 +0000)]
net: add two more call_rcu_hurry()
I had failures with pmtu.sh selftests lately,
with netns dismantles firing ref_tracking alerts [1].
After much debugging, I found that some queued
rcu callbacks were delayed by minutes, because
of CONFIG_RCU_LAZY=y option.
Joel Fernandes had a similar issue in the past,
fixed with commit
483c26ff63f4 ("net: Use call_rcu_hurry()
for dst_release()")
In this commit, I make sure nexthop_free_rcu()
and free_fib_info_rcu() are not delayed too much
because they both can release device references.
tools/testing/selftests/net/pmtu.sh no longer fails.
Traces were:
[ 968.179860] ref_tracker: veth_A-R1@
00000000d0ff3fe2 has 3/5 users at
dst_alloc+0x76/0x160
ip6_dst_alloc+0x25/0x80
ip6_pol_route+0x2a8/0x450
ip6_pol_route_output+0x1f/0x30
fib6_rule_lookup+0x163/0x270
ip6_route_output_flags+0xda/0x190
ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
ip6_dst_lookup_flow+0x47/0xa0
udp_tunnel6_dst_lookup+0x158/0x210
vxlan_xmit_one+0x4c2/0x1550 [vxlan]
vxlan_xmit+0x52d/0x14f0 [vxlan]
dev_hard_start_xmit+0x7b/0x1e0
__dev_queue_xmit+0x20b/0xe40
ip6_finish_output2+0x2ea/0x6e0
ip6_finish_output+0x143/0x320
ip6_output+0x74/0x140
[ 968.179860] ref_tracker: veth_A-R1@
00000000d0ff3fe2 has 1/5 users at
netdev_get_by_index+0xc0/0xe0
fib6_nh_init+0x1a9/0xa90
rtm_new_nexthop+0x6fa/0x1580
rtnetlink_rcv_msg+0x155/0x3e0
netlink_rcv_skb+0x61/0x110
rtnetlink_rcv+0x19/0x20
netlink_unicast+0x23f/0x380
netlink_sendmsg+0x1fc/0x430
____sys_sendmsg+0x2ef/0x320
___sys_sendmsg+0x86/0xd0
__sys_sendmsg+0x67/0xc0
__x64_sys_sendmsg+0x21/0x30
x64_sys_call+0x252/0x2030
do_syscall_64+0x6c/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 968.179860] ref_tracker: veth_A-R1@
00000000d0ff3fe2 has 1/5 users at
ipv6_add_dev+0x136/0x530
addrconf_notify+0x19d/0x770
notifier_call_chain+0x65/0xd0
raw_notifier_call_chain+0x1a/0x20
call_netdevice_notifiers_info+0x54/0x90
register_netdevice+0x61e/0x790
veth_newlink+0x230/0x440
__rtnl_newlink+0x7d2/0xaa0
rtnl_newlink+0x4c/0x70
rtnetlink_rcv_msg+0x155/0x3e0
netlink_rcv_skb+0x61/0x110
rtnetlink_rcv+0x19/0x20
netlink_unicast+0x23f/0x380
netlink_sendmsg+0x1fc/0x430
____sys_sendmsg+0x2ef/0x320
___sys_sendmsg+0x86/0xd0
....
[ 1079.316024] ? show_regs+0x68/0x80
[ 1079.316087] ? __warn+0x8c/0x140
[ 1079.316103] ? ref_tracker_free+0x1a0/0x270
[ 1079.316117] ? report_bug+0x196/0x1c0
[ 1079.316135] ? handle_bug+0x42/0x80
[ 1079.316149] ? exc_invalid_op+0x1c/0x70
[ 1079.316162] ? asm_exc_invalid_op+0x1f/0x30
[ 1079.316193] ? ref_tracker_free+0x1a0/0x270
[ 1079.316208] ? _raw_spin_unlock+0x1a/0x40
[ 1079.316222] ? free_unref_page+0x126/0x1a0
[ 1079.316239] ? destroy_large_folio+0x69/0x90
[ 1079.316251] ? __folio_put+0x99/0xd0
[ 1079.316276] dst_dev_put+0x69/0xd0
[ 1079.316308] fib6_nh_release_dsts.part.0+0x3d/0x80
[ 1079.316327] fib6_nh_release+0x45/0x70
[ 1079.316340] nexthop_free_rcu+0x131/0x170
[ 1079.316356] rcu_do_batch+0x1ee/0x820
[ 1079.316370] ? rcu_do_batch+0x179/0x820
[ 1079.316388] rcu_core+0x1aa/0x4d0
[ 1079.316405] rcu_core_si+0x12/0x20
[ 1079.316417] __do_softirq+0x13a/0x3dc
[ 1079.316435] __irq_exit_rcu+0xa3/0x110
[ 1079.316449] irq_exit_rcu+0x12/0x30
[ 1079.316462] sysvec_apic_timer_interrupt+0x5b/0xe0
[ 1079.316474] asm_sysvec_apic_timer_interrupt+0x1f/0x30
[ 1079.316569] RIP: 0033:0x7f06b65c63f0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240423205408.39632-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 19:40:48 +0000 (12:40 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_prueth.c
net/mac80211/chan.c
89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link")
87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/
20240422105623.
7b1fbda2@canb.auug.org.au/
net/unix/garbage.c
1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 23 Apr 2024 12:56:20 +0000 (12:56 +0000)]
tcp: avoid premature drops in tcp_add_backlog()
While testing TCP performance with latest trees,
I saw suspect SOCKET_BACKLOG drops.
tcp_add_backlog() computes its limit with :
limit = (u32)READ_ONCE(sk->sk_rcvbuf) +
(u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
limit += 64 * 1024;
This does not take into account that sk->sk_backlog.len
is reset only at the very end of __release_sock().
Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
sk_rcvbuf in normal conditions.
We should double sk->sk_rcvbuf contribution in the formula
to absorb bubbles in the backlog, which happen more often
for very fast flows.
This change maintains decent protection against abuses.
Fixes: c377411f2494 ("net: sk_add_backlog() take rmem_alloc into account")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240423125620.3309458-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 18:49:35 +0000 (11:49 -0700)]
Merge tag 'wireless-next-2024-04-24' of git://git./linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.10
The second "new features" pull request for v6.10 with changes both in
stack and in drivers. This time the pull request is rather small and
nothing special standing out except maybe that we have several
kernel-doc fixes. Great to see that we are getting warning free
wireless code (until new warnings are added).
Major changes:
rtl8xxxu:
* enable Management Frame Protection (MFP) support
rtw88:
* disable unsupported interface type of mesh point for all chips, and only
support station mode for SDIO chips.
* tag 'wireless-next-2024-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (63 commits)
wifi: mac80211: handle link ID during management Tx
wifi: mac80211: handle sdata->u.ap.active flag with MLO
wifi: cfg80211: add return docs for regulatory functions
wifi: cfg80211: make some regulatory functions void
wifi: mac80211: add return docs for sta_info_flush()
wifi: mac80211: keep mac80211 consistent on link activation failure
wifi: mac80211: simplify ieee80211_assign_link_chanctx()
wifi: mac80211: reserve chanctx during find
wifi: cfg80211: fix cfg80211 function kernel-doc
wifi: mac80211_hwsim: Use wider regulatory for custom for 6GHz tests
wifi: iwlwifi: mvm: Don't allow EMLSR when the RSSI is low
wifi: iwlwifi: mvm: disable EMLSR when we suspend with wowlan
wifi: iwlwifi: mvm: get periodic statistics in EMLSR
wifi: iwlwifi: mvm: don't recompute EMLSR mode in can_activate_links
wifi: iwlwifi: mvm: implement EMLSR prevention mechanism.
wifi: iwlwifi: mvm: exit EMLSR upon missed beacon
wifi: iwlwifi: mvm: init vif works only once
wifi: iwlwifi: mvm: Add helper functions to update EMLSR status
wifi: iwlwifi: mvm: Implement new link selection algorithm
wifi: iwlwifi: mvm: move EMLSR/links code
...
====================
Link: https://lore.kernel.org/r/20240424100122.217AEC113CE@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 18:46:41 +0000 (11:46 -0700)]
Merge branch 'net-dsa-b53-remove-adjust_link'
Florian Fainelli says:
====================
net: dsa: b53: Remove adjust_link
b53 is now the only remaining driver that uses both PHYLIB's adjust_link
and PHYLINK's mac_ops callbacks, convert entirely to PHYLINK.
====================
Link: https://lore.kernel.org/r/20240423183339.1368511-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:39 +0000 (11:33 -0700)]
net: dsa: b53: provide own phylink MAC operations
Convert b53 to provide its own phylink MAC operations, thus avoiding the
shim layer in DSA's port.c
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-9-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:38 +0000 (11:33 -0700)]
net: dsa: b53: Remove b53_adjust_link()
Only use the PHYLINK implementation from there on now that an equivalent
configuration is applied to all of the switch ports.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-8-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:37 +0000 (11:33 -0700)]
net: dsa: b53: Call b53_eee_init() from b53_mac_link_up()
And make sure this is done for the MLO_AN_PHY case, where it actually
makes sense, contrary to b53_adjust_link() which only did it for
fixed-PHY configurations where it does not make sense.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-7-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:36 +0000 (11:33 -0700)]
net: dsa: b53: Configure RGMII for 531x5 and MII for 5325
Call b53_adjust_531x5_rgmii() and b53_adjust_5325_mii() from
b53_phylink_mac_config() when we have a fixed PHY in preparation for removing
b53_adjust_link(). Also move b53_adjust_63xx_rgmii() to
b53_phylink_mac_config() where it logically belongs.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-6-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:35 +0000 (11:33 -0700)]
net: dsa: b53: Force flow control for BCM5301X CPU port(s)
Just like what b53_adjust_link() does, force flow control for the
BCM5301X CPU port(s) by forcing rx_pause and tx_pause in
b53_phylink_mac_link_up(). Preparatory step for getting rid of
b53_adjust_link().
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-5-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:34 +0000 (11:33 -0700)]
net: dsa: b53: Introduce b53_adjust_5325_mii()
Takes care of doing the 5325 switch series specific MII programming and
is called from b53_adjust_link() to allow the future removal of
b53_adjust_link().
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-4-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:33 +0000 (11:33 -0700)]
net: dsa: b53: Introduce b53_adjust_531x5_rgmii()
Takes care of doing the 531x5 switch series specific RGMII programming
and is called from b53_adjust_link() to allow the future removal of
b53_adjust_link().
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-3-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli [Tue, 23 Apr 2024 18:33:32 +0000 (11:33 -0700)]
net: dsa: b53: Stop exporting b53_phylink_* routines
They are not used outside of the b53_common.c file, no need to be
exported.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20240423183339.1368511-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 25 Apr 2024 18:19:38 +0000 (11:19 -0700)]
Merge tag 'net-6.9-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bluetooth.
Nothing major, regression fixes are mostly in drivers, two more of
those are flowing towards us thru various trees. I wish some of the
changes went into -rc5, we'll try to keep an eye on frequency of PRs
from sub-trees.
Also disproportional number of fixes for bugs added in v6.4, strange
coincidence.
Current release - regressions:
- igc: fix LED-related deadlock on driver unbind
- wifi: mac80211: small fixes to recent clean up of the connection
process
- Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel
doesn't have all the code to deal with that version, yet
- Bluetooth:
- set power_ctrl_enabled on NULL returned by gpiod_get_optional()
- qca: fix invalid device address check, again
- eth: ravb: fix registered interrupt names
Current release - new code bugs:
- wifi: mac80211: check EHT/TTLM action frame length
Previous releases - regressions:
- fix sk_memory_allocated_{add|sub} for architectures where
__this_cpu_{add|sub}* are not IRQ-safe
- dsa: mv88e6xx: fix link setup for
88E6250
Previous releases - always broken:
- ip: validate dev returned from __in_dev_get_rcu(), prevent possible
null-derefs in a few places
- switch number of for_each_rcu() loops using call_rcu() on the
iterator to for_each_safe()
- macsec: fix isolation of broadcast traffic in presence of offload
- vxlan: drop packets from invalid source address
- eth: mlxsw: trap and ACL programming fixes
- eth: bnxt: PCIe error recovery fixes, fix counting dropped packets
- Bluetooth:
- lots of fixes for the command submission rework from v6.4
- qca: fix NULL-deref on non-serdev suspend
Misc:
- tools: ynl: don't ignore errors in NLMSG_DONE messages"
* tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
net: b44: set pause params only when interface is up
tls: fix lockless read of strp->msg_ready in ->poll
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
net: ravb: Fix registered interrupt names
octeontx2-af: fix the double free in rvu_npc_freemem()
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
ice: fix LAG and VF lock dependency in ice_reset_vf()
iavf: Fix TC config comparison with existing adapter TC config
i40e: Report MFS in decimal base instead of hex
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
ethernet: Add helper for assigning packet type when dest address does not match device address
macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
net: phy: dp83869: Fix MII mode failure
netfilter: nf_tables: honor table dormant flag from netdev release event path
eth: bnxt: fix counting packets discarded due to OOM and netpoll
igc: Fix LED-related deadlock on driver unbind
...
Linus Torvalds [Thu, 25 Apr 2024 16:31:06 +0000 (09:31 -0700)]
Merge tag 'nfsd-6.9-5' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Revert some backchannel fixes that went into v6.9-rc
* tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "NFSD: Convert the callback workqueue to use delayed_work"
Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"
Linus Torvalds [Thu, 25 Apr 2024 16:23:38 +0000 (09:23 -0700)]
Merge tag 'for-linus-
2024042501' of git://git./linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- A couple of i2c-hid fixes (Kenny Levinsen & Nam Cao)
- A config issue with mcp-2221 when CONFIG_IIO is not enabled
(Abdelrahman Morsy)
- A dev_err fix in intel-ish-hid (Zhang Lixu)
- A couple of mouse fixes for both nintendo and Logitech-dj (Nuno
Pereira and Yaraslau Furman)
- I'm changing my main kernel email address as it's way simpler for me
than the Red Hat one (Benjamin Tissoires)
* tag 'for-linus-
2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: mcp-2221: cancel delayed_work only when CONFIG_IIO is enabled
HID: logitech-dj: allow mice to use all types of reports
HID: i2c-hid: Revert to await reset ACK before reading report descriptor
HID: nintendo: Fix N64 controller being identified as mouse
MAINTAINERS: update Benjamin's email address
HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
Jakub Kicinski [Thu, 25 Apr 2024 15:46:53 +0000 (08:46 -0700)]
Merge tag 'nf-24-04-25' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains two Netfilter/IPVS fixes for net:
Patch #1 fixes SCTP checksumming for IPVS with gso packets,
from Ismael Luceno.
Patch #2 honor dormant flag from netdev event path to fix a possible
double hook unregistration.
* tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: honor table dormant flag from netdev release event path
ipvs: Fix checksumming on GSO of SCTP packets
====================
Link: https://lore.kernel.org/r/20240425090149.1359547-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Wed, 24 Apr 2024 17:04:43 +0000 (10:04 -0700)]
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
syzbot reported a lockdep splat regarding unix_gc_lock and
unix_state_lock().
One is called from recvmsg() for a connected socket, and another
is called from GC for TCP_LISTEN socket.
So, the splat is false-positive.
Let's add a dedicated lock class for the latter to suppress the splat.
Note that this change is not necessary for net-next.git as the issue
is only applied to the old GC impl.
[0]:
WARNING: possible circular locking dependency detected
6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted
-----------------------------------------------------
kworker/u8:1/11 is trying to acquire lock:
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
but task is already holding lock:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (unix_gc_lock){+.+.}-{2:2}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
unix_detach_fds net/unix/af_unix.c:1819 [inline]
unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
skb_release_all net/core/skbuff.c:1200 [inline]
__kfree_skb net/core/skbuff.c:1216 [inline]
kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
kfree_skb include/linux/skbuff.h:1262 [inline]
manage_oob net/unix/af_unix.c:2672 [inline]
unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
do_splice_read fs/splice.c:985 [inline]
splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
do_splice+0xf2d/0x1880 fs/splice.c:1379
__do_splice fs/splice.c:1436 [inline]
__do_sys_splice fs/splice.c:1652 [inline]
__se_sys_splice+0x331/0x4a0 fs/splice.c:1634
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&u->lock){+.+.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(unix_gc_lock);
lock(&u->lock);
lock(unix_gc_lock);
lock(&u->lock);
*** DEADLOCK ***
3 locks held by kworker/u8:1/11:
#0:
ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
#0:
ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
#1:
ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
#1:
ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
#2:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
#2:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
stack backtrace:
CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted
6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: events_unbound __unix_gc
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Peter Münster [Wed, 24 Apr 2024 13:51:52 +0000 (15:51 +0200)]
net: b44: set pause params only when interface is up
b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers)
unconditionally, but b44::rx_buffers is only valid when the
device is up (they get allocated in b44_open(), and deallocated
again in b44_close()), any other time these are just a NULL pointers.
So if you try to change the pause params while the network interface
is disabled/administratively down, everything explodes (which likely
netifd tries to do).
Link: https://github.com/openwrt/openwrt/issues/13789
Fixes: 1da177e4c3f4 (Linux-2.6.12-rc2)
Cc: stable@vger.kernel.org
Reported-by: Peter Münster <pm@a16n.net>
Suggested-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vaclav Svoboda <svoboda@neng.cz>
Tested-by: Peter Münster <pm@a16n.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Peter Münster <pm@a16n.net>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Wed, 24 Apr 2024 10:25:47 +0000 (12:25 +0200)]
tls: fix lockless read of strp->msg_ready in ->poll
tls_sk_poll is called without locking the socket, and needs to read
strp->msg_ready (via tls_strp_msg_ready). Convert msg_ready to a bool
and use READ_ONCE/WRITE_ONCE where needed. The remaining reads are
only performed when the socket is locked.
Fixes: 121dca784fc0 ("tls: suppress wakeups unless we have a full record")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/0b7ee062319037cf86af6b317b3d72f7bfcd2e97.1713797701.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arkadiusz Kubalewski [Wed, 24 Apr 2024 10:16:36 +0000 (12:16 +0200)]
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
In scenario where pin is registered with multiple parent pins via
dpll_pin_on_pin_register(..), all belonging to the same dpll device.
A second call to dpll_pin_on_pin_unregister(..) would cause a call trace,
as it tries to use already released registration resources (due to fix
introduced in
b446631f355e). In this scenario pin was registered twice,
so resources are not yet expected to be release until each registered
pin/pin pair is unregistered.
Currently, the following crash/call trace is produced when ice driver is
removed on the system with installed E810T NIC which includes dpll device:
WARNING: CPU: 51 PID: 9155 at drivers/dpll/dpll_core.c:809 dpll_pin_ops+0x20/0x30
RIP: 0010:dpll_pin_ops+0x20/0x30
Call Trace:
? __warn+0x7f/0x130
? dpll_pin_ops+0x20/0x30
dpll_msg_add_pin_freq+0x37/0x1d0
dpll_cmd_pin_get_one+0x1c0/0x400
? __nlmsg_put+0x63/0x80
dpll_pin_event_send+0x93/0x140
dpll_pin_on_pin_unregister+0x3f/0x100
ice_dpll_deinit_pins+0xa1/0x230 [ice]
ice_remove+0xf1/0x210 [ice]
Fix by adding a parent pointer as a cookie when creating a registration,
also when searching for it. For the regular pins pass NULL, this allows to
create separated registration for each parent the pin is registered with.
Fixes: b446631f355e ("dpll: fix dpll_xa_ref_*_del() for multiple registrations")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240424101636.1491424-1-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geert Uytterhoeven [Wed, 24 Apr 2024 07:45:21 +0000 (09:45 +0200)]
net: ravb: Fix registered interrupt names
As interrupts are now requested from ravb_probe(), before calling
register_netdev(), ndev->name still contains the template "eth%d",
leading to funny names in /proc/interrupts. E.g. on R-Car E3:
89: 0 0 GICv2 93 Level eth%d:ch22:multi
90: 0 3 GICv2 95 Level eth%d:ch24:emac
91: 0 23484 GICv2 71 Level eth%d:ch0:rx_be
92: 0 0 GICv2 72 Level eth%d:ch1:rx_nc
93: 0 13735 GICv2 89 Level eth%d:ch18:tx_be
94: 0 0 GICv2 90 Level eth%d:ch19:tx_nc
Worse, on platforms with multiple RAVB instances (e.g. R-Car V4H), all
interrupts have similar names.
Fix this by using the device name instead, like is done in several other
drivers:
89: 0 0 GICv2 93 Level
e6800000.ethernet:ch22:multi
90: 0 1 GICv2 95 Level
e6800000.ethernet:ch24:emac
91: 0 28578 GICv2 71 Level
e6800000.ethernet:ch0:rx_be
92: 0 0 GICv2 72 Level
e6800000.ethernet:ch1:rx_nc
93: 0 14044 GICv2 89 Level
e6800000.ethernet:ch18:tx_be
94: 0 0 GICv2 90 Level
e6800000.ethernet:ch19:tx_nc
Rename the local variable dev_name, as it shadows the dev_name()
function, and pre-initialize it, to simplify the code.
Fixes: 32f012b8c01ca9fd ("net: ravb: Move getting/requesting IRQs in the probe() method")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> # on RZ/G3S
Link: https://lore.kernel.org/r/cde67b68adf115b3cf0b44c32334ae00b2fbb321.1713944647.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Su Hui [Wed, 24 Apr 2024 02:27:25 +0000 (10:27 +0800)]
octeontx2-af: fix the double free in rvu_npc_freemem()
Clang static checker(scan-build) warning:
drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c:line 2184, column 2
Attempt to free released memory.
npc_mcam_rsrcs_deinit() has released 'mcam->counters.bmap'. Deleted this
redundant kfree() to fix this double free problem.
Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240424022724.144587-1-suhui@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Reeder [Wed, 24 Apr 2024 07:16:26 +0000 (12:46 +0530)]
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
The CPTS, by design, captures the messageType (Sync, Delay_Req, etc.)
field from the second nibble of the PTP header which is defined in the
PTPv2 (1588-2008) specification. In the PTPv1 (1588-2002) specification
the first two bytes of the PTP header are defined as the versionType
which is always 0x0001. This means that any PTPv1 packets that are
tagged for TX timestamping by the CPTS will have their messageType set
to 0x0 which corresponds to a Sync message type. This causes issues
when a PTPv1 stack is expecting a Delay_Req (messageType: 0x1)
timestamp that never appears.
Fix this by checking if the ptp_class of the timestamped TX packet is
PTP_CLASS_V1 and then matching the PTP sequence ID to the stored
sequence ID in the skb->cb data structure. If the sequence IDs match
and the packet is of type PTPv1 then there is a chance that the
messageType has been incorrectly stored by the CPTS so overwrite the
messageType stored by the CPTS with the messageType from the skb->cb
data structure. This allows the PTPv1 stack to receive TX timestamps
for Delay_Req packets which are necessary to lock onto a PTP Leader.
Signed-off-by: Jason Reeder <jreeder@ti.com>
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Tested-by: Ed Trexel <ed.trexel@hp.com>
Fixes: f6bd59526ca5 ("net: ethernet: ti: introduce am654 common platform time sync driver")
Link: https://lore.kernel.org/r/20240424071626.32558-1-r-gunasekaran@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 15:04:28 +0000 (08:04 -0700)]
Merge branch 'intel-wired-lan-driver-updates-2024-04-23-i40e-iavf-ice'
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-04-23 (i40e, iavf, ice)
This series contains updates to i40e, iavf, and ice drivers.
Sindhu removes WQ_MEM_RECLAIM flag from workqueue for i40e.
Erwan Velu adjusts message to avoid confusion on base being reported on
i40e.
Sudheer corrects insufficient check for TC equality on iavf.
Jake corrects ordering of locks to avoid possible deadlock on ice.
====================
Link: https://lore.kernel.org/r/20240423182723.740401-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 14:59:04 +0000 (07:59 -0700)]
Merge branch 'fix-isolation-of-broadcast-traffic-and-unmatched-unicast-traffic-with-macsec-offload'
Rahul Rameshbabu says:
====================
Fix isolation of broadcast traffic and unmatched unicast traffic with MACsec offload
Some device drivers support devices that enable them to annotate whether a
Rx skb refers to a packet that was processed by the MACsec offloading
functionality of the device. Logic in the Rx handling for MACsec offload
does not utilize this information to preemptively avoid forwarding to the
macsec netdev currently. Because of this, things like multicast messages or
unicast messages with an unmatched destination address such as ARP requests
are forwarded to the macsec netdev whether the message received was MACsec
encrypted or not. The goal of this patch series is to improve the Rx
handling for MACsec offload for devices capable of annotating skbs received
that were decrypted by the NIC offload for MACsec.
Here is a summary of the issue that occurs with the existing logic today.
* The current design of the MACsec offload handling path tries to use
"best guess" mechanisms for determining whether a packet associated
with the currently handled skb in the datapath was processed via HW
offload
* The best guess mechanism uses the following heuristic logic (in order of
precedence)
- Check if header destination MAC address matches MACsec netdev MAC
address -> forward to MACsec port
- Check if packet is multicast traffic -> forward to MACsec port
- MACsec security channel was able to be looked up from skb offload
context (mlx5 only) -> forward to MACsec port
* Problem: plaintext traffic can potentially solicit a MACsec encrypted
response from the offload device
- Core aspect of MACsec is that it identifies unauthorized LAN connections
and excludes them from communication
+ This behavior can be seen when not enabling offload for MACsec
- The offload behavior violates this principle in MACsec
I believe this behavior is a security bug since applications utilizing
MACsec could be exploited using this behavior, and the correct way to
resolve this is by having the hardware correctly indicate whether MACsec
offload occurred for the packet or not. In the patches in this series, I
leave a warning for when the problematic path occurs because I cannot
figure out a secure way to fix the security issue that applies to the core
MACsec offload handling in the Rx path without breaking MACsec offload for
other vendors.
Shown at the bottom is an example use case where plaintext traffic sent to
a physical port of a NIC configured for MACsec offload is unable to be
handled correctly by the software stack when the NIC provides awareness to
the kernel about whether the received packet is MACsec traffic or not. In
this specific example, plaintext ARP requests are being responded with
MACsec encrypted ARP replies (which leads to routing information being
unable to be built for the requester).
Side 1
ip link del macsec0
ip address flush mlx5_1
ip address add 1.1.1.1/24 dev mlx5_1
ip link set dev mlx5_1 up
ip link add link mlx5_1 macsec0 type macsec sci 1 encrypt on
ip link set dev macsec0 address 00:11:22:33:44:66
ip macsec offload macsec0 mac
ip macsec add macsec0 tx sa 0 pn 1 on key 00
dffafc8d7b9a43d5b9a3dfbbf6a30c16
ip macsec add macsec0 rx sci 2 on
ip macsec add macsec0 rx sci 2 sa 0 pn 1 on key 00
ead3664f508eb06c40ac7104cdae4ce5
ip address flush macsec0
ip address add 2.2.2.1/24 dev macsec0
ip link set dev macsec0 up
# macsec0 enters promiscuous mode.
# This enables all traffic received on macsec_vlan to be processed by
# the macsec offload rx datapath. This however means that traffic
# meant to be received by mlx5_1 will be incorrectly steered to
# macsec0 as well.
ip link add link macsec0 name macsec_vlan type vlan id 1
ip link set dev macsec_vlan address 00:11:22:33:44:88
ip address flush macsec_vlan
ip address add 3.3.3.1/24 dev macsec_vlan
ip link set dev macsec_vlan up
Side 2
ip link del macsec0
ip address flush mlx5_1
ip address add 1.1.1.2/24 dev mlx5_1
ip link set dev mlx5_1 up
ip link add link mlx5_1 macsec0 type macsec sci 2 encrypt on
ip link set dev macsec0 address 00:11:22:33:44:77
ip macsec offload macsec0 mac
ip macsec add macsec0 tx sa 0 pn 1 on key 00
ead3664f508eb06c40ac7104cdae4ce5
ip macsec add macsec0 rx sci 1 on
ip macsec add macsec0 rx sci 1 sa 0 pn 1 on key 00
dffafc8d7b9a43d5b9a3dfbbf6a30c16
ip address flush macsec0
ip address add 2.2.2.2/24 dev macsec0
ip link set dev macsec0 up
# macsec0 enters promiscuous mode.
# This enables all traffic received on macsec_vlan to be processed by
# the macsec offload rx datapath. This however means that traffic
# meant to be received by mlx5_1 will be incorrectly steered to
# macsec0 as well.
ip link add link macsec0 name macsec_vlan type vlan id 1
ip link set dev macsec_vlan address 00:11:22:33:44:99
ip address flush macsec_vlan
ip address add 3.3.3.2/24 dev macsec_vlan
ip link set dev macsec_vlan up
Side 1
ping -I mlx5_1 1.1.1.2
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 mlx5_1: 56(84) bytes of data.
From 1.1.1.1 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: No route to host
From 1.1.1.1 icmp_seq=2 Destination Host Unreachable
From 1.1.1.1 icmp_seq=3 Destination Host Unreachable
Changes:
v2->v3:
* Made dev paramater const for eth_skb_pkt_type helper as suggested by Sabrina
Dubroca <sd@queasysnail.net>
v1->v2:
* Fixed series subject to detail the issue being fixed
* Removed strange characters from cover letter
* Added comment in example that illustrates the impact involving
promiscuous mode
* Added patch for generalizing packet type detection
* Added Fixes: tags and targeting net
* Removed pointless warning in the heuristic Rx path for macsec offload
* Applied small refactor in Rx path offload to minimize scope of rx_sc
local variable
Link: https://github.com/Binary-Eater/macsec-rx-offload/blob/trunk/MACsec_violation_in_core_stack_offload_rx_handling.pdf
Link: https://lore.kernel.org/netdev/20240419213033.400467-5-rrameshbabu@nvidia.com/
Link: https://lore.kernel.org/netdev/20240419011740.333714-1-rrameshbabu@nvidia.com/
Link: https://lore.kernel.org/netdev/87r0l25y1c.fsf@nvidia.com/
Link: https://lore.kernel.org/netdev/20231116182900.46052-1-rrameshbabu@nvidia.com/
====================
Link: https://lore.kernel.org/r/20240423181319.115860-1-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Tue, 23 Apr 2024 18:27:20 +0000 (11:27 -0700)]
ice: fix LAG and VF lock dependency in ice_reset_vf()
9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over
aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf().
The commit placed this lock acquisition just prior to the acquisition of
the VF configuration lock.
If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK
flag, this could deadlock with ice_vc_cfg_qs_msg() because it always
acquires the locks in the order of the VF configuration lock and then the
LAG mutex.
Lockdep reports this violation almost immediately on creating and then
removing 2 VF:
======================================================
WARNING: possible circular locking dependency detected
6.8.0-rc6 #54 Tainted: G W O
------------------------------------------------------
kworker/60:3/6771 is trying to acquire lock:
ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice]
but task is already holding lock:
ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&pf->lag_mutex){+.+.}-{3:3}:
__lock_acquire+0x4f8/0xb40
lock_acquire+0xd4/0x2d0
__mutex_lock+0x9b/0xbf0
ice_vc_cfg_qs_msg+0x45/0x690 [ice]
ice_vc_process_vf_msg+0x4f5/0x870 [ice]
__ice_clean_ctrlq+0x2b5/0x600 [ice]
ice_service_task+0x2c9/0x480 [ice]
process_one_work+0x1e9/0x4d0
worker_thread+0x1e1/0x3d0
kthread+0x104/0x140
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1b/0x30
-> #0 (&vf->cfg_lock){+.+.}-{3:3}:
check_prev_add+0xe2/0xc50
validate_chain+0x558/0x800
__lock_acquire+0x4f8/0xb40
lock_acquire+0xd4/0x2d0
__mutex_lock+0x9b/0xbf0
ice_reset_vf+0x22f/0x4d0 [ice]
ice_process_vflr_event+0x98/0xd0 [ice]
ice_service_task+0x1cc/0x480 [ice]
process_one_work+0x1e9/0x4d0
worker_thread+0x1e1/0x3d0
kthread+0x104/0x140
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1b/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&pf->lag_mutex);
lock(&vf->cfg_lock);
lock(&pf->lag_mutex);
lock(&vf->cfg_lock);
*** DEADLOCK ***
4 locks held by kworker/60:3/6771:
#0:
ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
#1:
ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
#2:
ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice]
#3:
ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]
stack backtrace:
CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G W O 6.8.0-rc6 #54
Hardware name:
Workqueue: ice ice_service_task [ice]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x80
check_noncircular+0x12d/0x150
check_prev_add+0xe2/0xc50
? save_trace+0x59/0x230
? add_chain_cache+0x109/0x450
validate_chain+0x558/0x800
__lock_acquire+0x4f8/0xb40
? lockdep_hardirqs_on+0x7d/0x100
lock_acquire+0xd4/0x2d0
? ice_reset_vf+0x22f/0x4d0 [ice]
? lock_is_held_type+0xc7/0x120
__mutex_lock+0x9b/0xbf0
? ice_reset_vf+0x22f/0x4d0 [ice]
? ice_reset_vf+0x22f/0x4d0 [ice]
? rcu_is_watching+0x11/0x50
? ice_reset_vf+0x22f/0x4d0 [ice]
ice_reset_vf+0x22f/0x4d0 [ice]
? process_one_work+0x176/0x4d0
ice_process_vflr_event+0x98/0xd0 [ice]
ice_service_task+0x1cc/0x480 [ice]
process_one_work+0x1e9/0x4d0
worker_thread+0x1e1/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x104/0x140
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
To avoid deadlock, we must acquire the LAG mutex only after acquiring the
VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only
after we either acquire or check that the VF configuration lock is held.
Fixes: 9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240423182723.740401-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sudheer Mogilappagari [Tue, 23 Apr 2024 18:27:19 +0000 (11:27 -0700)]
iavf: Fix TC config comparison with existing adapter TC config
Same number of TCs doesn't imply that underlying TC configs are
same. The config could be different due to difference in number
of queues in each TC. Add utility function to determine if TC
configs are same.
Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf")
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Mineri Bhange <minerix.bhange@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240423182723.740401-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Erwan Velu [Tue, 23 Apr 2024 18:27:18 +0000 (11:27 -0700)]
i40e: Report MFS in decimal base instead of hex
If the MFS is set below the default (0x2600), a warning message is
reported like the following :
MFS for port 1 has been set below the default: 600
This message is a bit confusing as the number shown here (600) is in
fact an hexa number: 0x600 = 1536
Without any explicit "0x" prefix, this message is read like the MFS is
set to 600 bytes.
MFS, as per MTUs, are usually expressed in decimal base.
This commit reports both current and default MFS values in decimal
so it's less confusing for end-users.
A typical warning message looks like the following :
MFS for port 1 (1536) has been set below the default (9728)
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set")
Link: https://lore.kernel.org/r/20240423182723.740401-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sindhu Devale [Tue, 23 Apr 2024 18:27:17 +0000 (11:27 -0700)]
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
Issue reported by customer during SRIOV testing, call trace:
When both i40e and the i40iw driver are loaded, a warning
in check_flush_dependency is being triggered. This seems
to be because of the i40e driver workqueue is allocated with
the WQ_MEM_RECLAIM flag, and the i40iw one is not.
Similar error was encountered on ice too and it was fixed by
removing the flag. Do the same for i40e too.
[Feb 9 09:08] ------------[ cut here ]------------
[ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is
flushing !WQ_MEM_RECLAIM infiniband:0x0
[ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966
check_flush_dependency+0x10b/0x120
[ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq
snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4
nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr
rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma
intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif
isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal
intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core
iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore
ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich
intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad
xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe
drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel
libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror
dm_region_hash dm_log dm_mod fuse
[ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not
tainted 6.8.0-rc2-Feb-net_dev
-Qiueue-00279-gbd43c5687e05 #1
[ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS
SE5C620.86B.02.01.0013.
121520200651 12/15/2020
[ +0.000001] Workqueue: i40e i40e_service_task [i40e]
[ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120
[ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48
81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd
ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90
[ +0.000002] RSP: 0018:
ffffbd294976bcf8 EFLAGS:
00010282
[ +0.000002] RAX:
0000000000000000 RBX:
ffff94d4c483c000 RCX:
0000000000000027
[ +0.000001] RDX:
ffff94d47f620bc8 RSI:
0000000000000001 RDI:
ffff94d47f620bc0
[ +0.000001] RBP:
0000000000000000 R08:
0000000000000000 R09:
00000000ffff7fff
[ +0.000001] R10:
ffffbd294976bb98 R11:
ffffffffa0be65e8 R12:
ffff94c5451ea180
[ +0.000001] R13:
ffff94c5ab5e8000 R14:
ffff94c5c20b6e05 R15:
ffff94c5f1330ab0
[ +0.000001] FS:
0000000000000000(0000) GS:
ffff94d47f600000(0000)
knlGS:
0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ +0.000001] CR2:
00007f9e6f1fca70 CR3:
0000000038e20004 CR4:
00000000007706f0
[ +0.000000] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ +0.000001] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ +0.000001] PKRU:
55555554
[ +0.000001] Call Trace:
[ +0.000001] <TASK>
[ +0.000002] ? __warn+0x80/0x130
[ +0.000003] ? check_flush_dependency+0x10b/0x120
[ +0.000002] ? report_bug+0x195/0x1a0
[ +0.000005] ? handle_bug+0x3c/0x70
[ +0.000003] ? exc_invalid_op+0x14/0x70
[ +0.000002] ? asm_exc_invalid_op+0x16/0x20
[ +0.000006] ? check_flush_dependency+0x10b/0x120
[ +0.000002] ? check_flush_dependency+0x10b/0x120
[ +0.000002] __flush_workqueue+0x126/0x3f0
[ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core]
[ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core]
[ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core]
[ +0.000020] i40iw_close+0x4b/0x90 [irdma]
[ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e]
[ +0.000035] i40e_service_task+0x126/0x190 [i40e]
[ +0.000024] process_one_work+0x174/0x340
[ +0.000003] worker_thread+0x27e/0x390
[ +0.000001] ? __pfx_worker_thread+0x10/0x10
[ +0.000002] kthread+0xdf/0x110
[ +0.000002] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork+0x2d/0x50
[ +0.000003] ? __pfx_kthread+0x10/0x10
[ +0.000001] ret_from_fork_asm+0x1b/0x30
[ +0.000004] </TASK>
[ +0.000001] ---[ end trace
0000000000000000 ]---
Fixes: 4d5957cbdecd ("i40e: remove WQ_UNBOUND and the task limit of our workqueue")
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Robert Ganzynkowicz <robert.ganzynkowicz@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240423182723.740401-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Tue, 23 Apr 2024 16:15:22 +0000 (19:15 +0300)]
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
The rx_chn->irq[] array is unsigned int but it should be signed for the
error handling to work. Also if k3_udma_glue_rx_get_irq() returns zero
then we should return -ENXIO instead of success.
Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/05282415-e7f4-42f3-99f8-32fde8f30936@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rahul Rameshbabu [Tue, 23 Apr 2024 18:13:05 +0000 (11:13 -0700)]
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
mlx5 Rx flow steering and CQE handling enable the driver to be able to
update an skb's md_dst attribute as MACsec when MACsec traffic arrives when
a device is configured for offloading. Advertise this to the core stack to
take advantage of this capability.
Cc: stable@vger.kernel.org
Fixes: b7c9400cbc48 ("net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240423181319.115860-5-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rahul Rameshbabu [Tue, 23 Apr 2024 18:13:04 +0000 (11:13 -0700)]
macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
Can now correctly identify where the packets should be delivered by using
md_dst or its absence on devices that provide it.
This detection is not possible without device drivers that update md_dst. A
fallback pattern should be used for supporting such device drivers. This
fallback mode causes multicast messages to be cloned to both the non-macsec
and macsec ports, independent of whether the multicast message received was
encrypted over MACsec or not. Other non-macsec traffic may also fail to be
handled correctly for devices in promiscuous mode.
Link: https://lore.kernel.org/netdev/ZULRxX9eIbFiVi7v@hog/
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: stable@vger.kernel.org
Fixes: 860ead89b851 ("net/macsec: Add MACsec skb_metadata_dst Rx Data path support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240423181319.115860-4-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rahul Rameshbabu [Tue, 23 Apr 2024 18:13:03 +0000 (11:13 -0700)]
ethernet: Add helper for assigning packet type when dest address does not match device address
Enable reuse of logic in eth_type_trans for determining packet type.
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: stable@vger.kernel.org
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240423181319.115860-3-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rahul Rameshbabu [Tue, 23 Apr 2024 18:13:02 +0000 (11:13 -0700)]
macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
Cannot know whether a Rx skb missing md_dst is intended for MACsec or not
without knowing whether the device is able to update this field during an
offload. Assume that an offload to a MACsec device cannot support updating
md_dst by default. Capable devices can advertise that they do indicate that
an skb is related to a MACsec offloaded packet using the md_dst.
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: stable@vger.kernel.org
Fixes: 860ead89b851 ("net/macsec: Add MACsec skb_metadata_dst Rx Data path support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240423181319.115860-2-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Wed, 24 Apr 2024 09:58:20 +0000 (11:58 +0200)]
tools: testing: selftests: prefer TEST_PROGS for conntrack_dump_flush
Currently conntrack_dump_flush test program always runs when passing
TEST_PROGS argument:
% make -C tools/testing/selftests TARGETS=net/netfilter \
TEST_PROGS=conntrack_ipip_mtu.sh run_tests
make: Entering [..]
TAP version 13
1..2 [..]
selftests: net/netfilter: conntrack_dump_flush [..]
Move away from TEST_CUSTOM_PROGS to avoid this. After this,
above command will only run the program specified in TEST_PROGS.
Link: https://lore.kernel.org/netdev/20240423191609.70c14c42@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240424095824.5555-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
linke li [Tue, 23 Apr 2024 10:53:26 +0000 (18:53 +0800)]
net: bridge: remove redundant check of f->dst
In br_fill_forward_path(), f->dst is checked not to be NULL, then
immediately read using READ_ONCE and checked again. The first check is
useless, so this patch aims to remove the redundant check of f->dst.
Signed-off-by: linke li <lilinke99@qq.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 Apr 2024 11:18:37 +0000 (12:18 +0100)]
Merge tag 'wireless-2024-04-23' of git://git./linux/kernel/git/wireless/wireless
Johannes berg says:
====================
Fixes for the current cycle:
* ath11k: convert to correct RCU iteration of IPv6 addresses
* iwlwifi: link ID, FW API version, scanning and PASN fixes
* cfg80211: NULL-deref and tracing fixes
* mac80211: connection mode, mesh fast-TX, multi-link and
various other small fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
MD Danish Anwar [Tue, 23 Apr 2024 08:48:28 +0000 (14:18 +0530)]
net: phy: dp83869: Fix MII mode failure
The DP83869 driver sets the MII bit (needed for PHY to work in MII mode)
only if the op-mode is either DP83869_100M_MEDIA_CONVERT or
DP83869_RGMII_100_BASE.
Some drivers i.e. ICSSG support MII mode with op-mode as
DP83869_RGMII_COPPER_ETHERNET for which the MII bit is not set in dp83869
driver. As a result MII mode on ICSSG doesn't work and below log is seen.
TI DP83869
300b2400.mdio:0f: selected op-mode is not valid with MII mode
icssg-prueth icssg1-eth: couldn't connect to phy ethernet-phy@0
icssg-prueth icssg1-eth: can't phy connect port MII0
Fix this by setting MII bit for DP83869_RGMII_COPPER_ETHERNET op-mode as
well.
Fixes: 94e86ef1b801 ("net: phy: dp83869: support mii mode when rgmii strap cfg is used")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Tue, 23 Apr 2024 07:47:49 +0000 (09:47 +0200)]
net: dsa: mv88e6xxx: Avoid EEPROM timeout without EEPROM on
88E6250-family switches
88E6250-family switches have the quirk that the EEPROM Running flag can
get stuck at 1 when no EEPROM is connected, causing
mv88e6xxx_g2_eeprom_wait() to time out. We still want to wait for the
EEPROM however, to avoid interrupting a transfer and leaving the EEPROM
in an invalid state.
The condition to wait for recommended by the hardware spec is the EEInt
flag, however this flag is cleared on read, so before the hardware reset,
is may have been cleared already even though the EEPROM has been read
successfully.
For this reason, we revive the mv88e6xxx_g1_wait_eeprom_done() function
that was removed in commit
6ccf50d4d474
("net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent") in a
slightly refactored form, and introduce a new
mv88e6xxx_g1_wait_eeprom_done_prereset() that additionally handles this
case by triggering another EEPROM reload that can be waited on.
On other switch models without this quirk, mv88e6xxx_g2_eeprom_wait() is
kept, as it avoids the additional reload.
Fixes: 6ccf50d4d474 ("net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Tue, 23 Apr 2024 07:47:48 +0000 (09:47 +0200)]
net: dsa: mv88e6xxx: Add support for model-specific pre- and post-reset handlers
Instead of calling mv88e6xxx_g2_eeprom_wait() directly from
mv88e6xxx_hardware_reset(), add configurable pre- and post-reset hard
reset handlers. Initially, the handlers are set to
mv88e6xxx_g2_eeprom_wait() for all families that have get/set_eeprom()
to match the existing behavior. No functional change intended (except
for additional error messages on failure).
Fixes: 6ccf50d4d474 ("net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Satish Kharat [Tue, 23 Apr 2024 03:53:05 +0000 (20:53 -0700)]
enic: Replace hardcoded values for vnic descriptor by defines
Replace the hardcoded values used in the calculations for
vnic descriptors and rings with defines. Minor code cleanup.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 24 Apr 2024 18:45:01 +0000 (20:45 +0200)]
netfilter: nf_tables: honor table dormant flag from netdev release event path
Check for table dormant flag otherwise netdev release event path tries
to unregister an already unregistered hook.
[524854.857999] ------------[ cut here ]------------
[524854.858010] WARNING: CPU: 0 PID:
3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260
[...]
[524854.858848] CPU: 0 PID:
3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365
[524854.858869] Workqueue: netns cleanup_net
[524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260
[524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41
[524854.858914] RSP: 0018:
ffff8881e36d79e0 EFLAGS:
00010246
[524854.858926] RAX:
0000000000000000 RBX:
ffff8881339ae790 RCX:
ffffffff81ba524a
[524854.858936] RDX:
dffffc0000000000 RSI:
0000000000000008 RDI:
ffff8881c8a16438
[524854.858945] RBP:
ffff8881c8a16438 R08:
0000000000000001 R09:
ffffed103c6daf34
[524854.858954] R10:
ffff8881e36d79a7 R11:
0000000000000000 R12:
0000000000000005
[524854.858962] R13:
ffff8881c8a16000 R14:
0000000000000000 R15:
ffff8881351b5a00
[524854.858971] FS:
0000000000000000(0000) GS:
ffff888390800000(0000) knlGS:
0000000000000000
[524854.858982] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[524854.858991] CR2:
00007fc9be0f16f4 CR3:
00000001437cc004 CR4:
00000000001706f0
[524854.859000] Call Trace:
[524854.859006] <TASK>
[524854.859013] ? __warn+0x9f/0x1a0
[524854.859027] ? __nf_unregister_net_hook+0x21a/0x260
[524854.859044] ? report_bug+0x1b1/0x1e0
[524854.859060] ? handle_bug+0x3c/0x70
[524854.859071] ? exc_invalid_op+0x17/0x40
[524854.859083] ? asm_exc_invalid_op+0x1a/0x20
[524854.859100] ? __nf_unregister_net_hook+0x6a/0x260
[524854.859116] ? __nf_unregister_net_hook+0x21a/0x260
[524854.859135] nf_tables_netdev_event+0x337/0x390 [nf_tables]
[524854.859304] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859461] ? packet_notifier+0xb3/0x360
[524854.859476] ? _raw_spin_unlock_irqrestore+0x11/0x40
[524854.859489] ? dcbnl_netdevice_event+0x35/0x140
[524854.859507] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859661] notifier_call_chain+0x7d/0x140
[524854.859677] unregister_netdevice_many_notify+0x5e1/0xae0
Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Thu, 25 Apr 2024 07:52:12 +0000 (08:52 +0100)]
Merge branch 'tcp-trace-next'
Philo Lu says:
====================
tcp: update TCPCB_EVER_RETRANS after trace_tcp_retransmit_skb()
Move TCPCB_EVER_RETRANS updating after the trace_tcp_retransmit_skb()
in __tcp_retransmit_skb(), and then we are aware of whether the skb has
ever been retransmitted in this tracepoint. This can be used, e.g., to get
retransmission efficiency by counting skbs w/ and w/o TCPCB_EVER_RETRANS
(through bpf tracing programs).
For this purpose, TCPCB_EVER_RETRANS is also needed to be exposed to bpf.
Previously, the flags are defined as macros in struct tcp_skb_cb. I moved them
out into a new enum, and then they can be accessed with vmlinux.h.
We have discussed to achieve this with BPF_SOCK_OPS in [0], and using
tracepoint is thought to be a better solution.
[0]
https://lore.kernel.org/all/
20240417124622.35333-1-lulie@linux.alibaba.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Philo Lu [Sun, 21 Apr 2024 04:20:09 +0000 (12:20 +0800)]
tcp: update sacked after tracepoint in __tcp_retransmit_skb
Marking TCP_SKB_CB(skb)->sacked with TCPCB_EVER_RETRANS after the
traceopint (trace_tcp_retransmit_skb), then we can get the
retransmission efficiency by counting skbs w/ and w/o TCPCB_EVER_RETRANS
mark in this tracepoint.
We have discussed to achieve this with BPF_SOCK_OPS in [0], and using
tracepoint is thought to be a better solution.
[0]
https://lore.kernel.org/all/
20240417124622.35333-1-lulie@linux.alibaba.com/
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philo Lu [Sun, 21 Apr 2024 04:20:08 +0000 (12:20 +0800)]
tcp: move tcp_skb_cb->sacked flags to enum
Move the flag definitions for tcp_skb_cb->sacked into a new enum named
tcp_skb_cb_sacked_flags, then we can get access to them in bpf via
vmlinux.h, e.g., in tracepoints.
This patch does not change any existing functionality.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 25 Apr 2024 03:29:49 +0000 (20:29 -0700)]
Merge tag 'for-net-2024-04-24' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional()
- hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor
- qca: fix invalid device address check
- hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync
- Fix type of len in {l2cap,sco}_sock_getsockopt_old()
- btusb: mediatek: Fix double free of skb in coredump
- btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853
- btusb: Fix triggering coredump implementation for QCA
* tag 'for-net-2024-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional()
Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor
Bluetooth: qca: fix NULL-deref on non-serdev setup
Bluetooth: qca: fix NULL-deref on non-serdev suspend
Bluetooth: btusb: mediatek: Fix double free of skb in coredump
Bluetooth: MGMT: Fix failing to MGMT_OP_ADD_UUID/MGMT_OP_REMOVE_UUID
Bluetooth: qca: fix invalid device address check
Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE
Bluetooth: btusb: Fix triggering coredump implementation for QCA
Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853
Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync
Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()
====================
Link: https://lore.kernel.org/r/20240424204102.2319483-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Apr 2024 00:21:48 +0000 (17:21 -0700)]
eth: bnxt: fix counting packets discarded due to OOM and netpoll
I added OOM and netpoll discard counters, naively assuming that
the cpr pointer is pointing to a common completion ring.
Turns out that is usually *a* completion ring but not *the*
completion ring which bnapi->cp_ring points to. bnapi->cp_ring
is where the stats are read from, so we end up reporting 0
thru ethtool -S and qstat even though the drop events have happened.
Make 100% sure we're recording statistics in the correct structure.
Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 03:15:47 +0000 (20:15 -0700)]
Merge branch 'selftests-net-extract-bpf-building-logic-from-the-makefile'
Jakub Kicinski says:
====================
selftests: net: extract BPF building logic from the Makefile
This has been sitting in my tree for a while. I will soon add YNL/libynl
support for networking selftests. This prompted a small cleanup of
the selftest makefile for net/. We don't want to be piling logic
for each library in there. YNL will get its own .mk file which can
be included. Do the same for the BPF building section, already.
No funcional changes here, just a code move and small rename.
====================
Link: https://lore.kernel.org/r/20240423183542.3807234-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 23 Apr 2024 18:35:42 +0000 (11:35 -0700)]
selftests: net: extract BPF building logic from the Makefile
The BPF sample building code looks a little bit spaghetti-ish
so move it out to its own Makefile snippet. Similar in the spirit
to how we include lib.mk. libynl will soon get a similar snippet.
There is a small change hiding in the move, the relative
paths (../../.., ../.. etc) are replaced with variables
from lib.mk such as top_srcdir and selfdir.
Link: https://lore.kernel.org/r/20240423183542.3807234-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 23 Apr 2024 18:35:41 +0000 (11:35 -0700)]
selftests: net: name bpf objects consistently and simplify Makefile
The BPF sources moved with bpf_offload.py have a suffix of .bpf.c
which seems to be useful convention. Rename the 2 other BPF sources
we had. Use wildcard in the Makefile, since we can match all those
files easily now.
Link: https://lore.kernel.org/r/20240423183542.3807234-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent (Dent Project) [Tue, 23 Apr 2024 09:21:12 +0000 (11:21 +0200)]
net: pse-pd: Kconfig: Add missing Regulator API dependency
The PSE (Power Sourcing Equipment) API now relies on the Regulator API.
However, the Regulator dependency was missing from Kconfig. This patch
adds the necessary dependency, resolving the issue of the missing
dependency and ensuring proper functionality of the PSE API.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404201020.mqX2IOu7-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202404200036.D8ap1Mf5-lkp@intel.com/
Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240423-fix_poe-v3-3-e50f32f5fa59@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent (Dent Project) [Tue, 23 Apr 2024 09:21:11 +0000 (11:21 +0200)]
net: pse-pd: pse_core: Fix pse regulator type
Clarify PSE regulator as voltage regulator, not current.
The PSE (Power Sourcing Equipment) regulator is defined as a voltage
regulator, maintaining fixed voltage while accommodating varying current.
Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240423-fix_poe-v3-2-e50f32f5fa59@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent (Dent Project) [Tue, 23 Apr 2024 09:21:10 +0000 (11:21 +0200)]
net: pse-pd: pse_core: Add missing kdoc return description
Add missing kernel documentation return description.
This allows to remove all warning from kernel-doc test script.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240423-fix_poe-v3-1-e50f32f5fa59@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Wunner [Mon, 22 Apr 2024 20:45:02 +0000 (13:45 -0700)]
igc: Fix LED-related deadlock on driver unbind
Roman reports a deadlock on unplug of a Thunderbolt docking station
containing an Intel I225 Ethernet adapter.
The root cause is that led_classdev's for LEDs on the adapter are
registered such that they're device-managed by the netdev. That
results in recursive acquisition of the rtnl_lock() mutex on unplug:
When the driver calls unregister_netdev(), it acquires rtnl_lock(),
then frees the device-managed resources. Upon unregistering the LEDs,
netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
which tries to acquire rtnl_lock() again.
Avoid by using non-device-managed LED registration.
Stack trace for posterity:
schedule+0x6e/0xf0
schedule_preempt_disabled+0x15/0x20
__mutex_lock+0x2a0/0x750
unregister_netdevice_notifier+0x40/0x150
netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
led_trigger_set+0x102/0x330
led_classdev_unregister+0x4b/0x110
release_nodes+0x3d/0xb0
devres_release_all+0x8b/0xc0
device_del+0x34f/0x3c0
unregister_netdevice_many_notify+0x80b/0xaf0
unregister_netdev+0x7c/0xd0
igc_remove+0xd8/0x1e0 [igc]
pci_device_remove+0x3f/0xb0
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Reported-by: Roman Lozko <lozko.roma@gmail.com>
Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
Reported-by: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240422204503.225448-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 03:05:31 +0000 (20:05 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: Support 5 layer Tx scheduler topology
Mateusz Polchlopek says:
For performance reasons there is a need to have support for selectable
Tx scheduler topology. Currently firmware supports only the default
9-layer and 5-layer topology. This patch series enables switch from
default to 5-layer topology, if user decides to opt-in.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Document tx_scheduling_layers parameter
ice: Add tx_scheduling_layers devlink param
ice: Enable switching default Tx scheduler topology
ice: Adjust the VSI/Aggregator layers
ice: Support 5 layer topology
devlink: extend devlink_param *set pointer
====================
Link: https://lore.kernel.org/r/20240422203913.225151-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Mon, 22 Apr 2024 15:27:34 +0000 (15:27 +0000)]
octeontx2-pf: flower: check for unsupported control flags
Use flow_rule_is_supp_control_flags() to reject filters with
unsupported control flags.
In case any unsupported control flags are masked,
flow_rule_is_supp_control_flags() sets a NL extended
error message, and we return -EOPNOTSUPP.
Remove FLOW_DIS_FIRST_FRAG specific error message,
and treat it as any other unsupported control flag.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20240422152735.175693-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Mon, 22 Apr 2024 15:27:16 +0000 (15:27 +0000)]
net: hns3: flower: validate control flags
This driver currently doesn't support any control flags.
Use flow_rule_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Also propagate extack to hclge_get_cls_key_ip(), and convert it to
return error code.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20240422152717.175659-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Mon, 22 Apr 2024 15:26:55 +0000 (15:26 +0000)]
net: ethernet: ti: cpsw: flower: validate control flags
This driver currently doesn't support any control flags.
Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240422152656.175627-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Mon, 22 Apr 2024 15:26:42 +0000 (15:26 +0000)]
net: ethernet: ti: am65-cpsw: flower: validate control flags
This driver currently doesn't support any control flags.
Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240422152643.175592-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen [Mon, 22 Apr 2024 15:26:23 +0000 (15:26 +0000)]
bnxt_en: flower: validate control flags
This driver currently doesn't support any control flags.
Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Link: https://lore.kernel.org/r/20240422152626.175569-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chintan Vankar [Mon, 22 Apr 2024 12:45:15 +0000 (18:15 +0530)]
net: ethernet: ti: am65-cpsw-nuss: Enable SGMII mode for J784S4 CPSW9G
TI's J784S4 SoC supports SGMII mode with CPSW9G instance of the CPSW
Ethernet Switch. Thus, enable it by adding SGMII mode to the
extra_modes member of the "j784s4_cpswxg_pdata" SoC data.
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Link: https://lore.kernel.org/r/20240422124515.887511-1-c-vankar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Duanqiang Wen [Mon, 22 Apr 2024 08:41:09 +0000 (16:41 +0800)]
Revert "net: txgbe: fix clk_name exceed MAX_DEV_ID limits"
This reverts commit
e30cef001da259e8df354b813015d0e5acc08740.
commit
99f4570cfba1 ("clkdev: Update clkdev id usage to allow
for longer names") can fix clk_name exceed MAX_DEV_ID limits,
so this commit is meaningless.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240422084109.3201-2-duanqiangwen@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Duanqiang Wen [Mon, 22 Apr 2024 08:41:08 +0000 (16:41 +0800)]
Revert "net: txgbe: fix i2c dev name cannot match clkdev"
This reverts commit
c644920ce9220d83e070f575a4df711741c07f07.
when register i2c dev, txgbe shorten "i2c_designware" to "i2c_dw",
will cause this i2c dev can't match platfom driver i2c_designware_platform.
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240422084109.3201-1-duanqiangwen@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 25 Apr 2024 02:33:04 +0000 (19:33 -0700)]
Merge branch 'mlxsw-various-acl-fixes'
Petr Machata says:
====================
mlxsw: Various ACL fixes
Ido Schimmel writes:
Fix various problems in the ACL (i.e., flower offload) code. See the
commit messages for more details.
====================
Link: https://lore.kernel.org/r/cover.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:26:02 +0000 (17:26 +0200)]
mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work
The rehash delayed work is rescheduled with a delay if the number of
credits at end of the work is not negative as supposedly it means that
the migration ended. Otherwise, it is rescheduled immediately.
After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during
rehash" the above is no longer accurate as a non-negative number of
credits is no longer indicative of the migration being done. It can also
happen if the work encountered an error in which case the migration will
resume the next time the work is scheduled.
The significance of the above is that it is possible for the work to be
pending and associated with hints that were allocated when the migration
started. This leads to the hints being leaked [1] when the work is
canceled while pending as part of ACL region dismantle.
Fix by freeing the hints if hints are associated with a work that was
canceled while pending.
Blame the original commit since the reliance on not having a pending
work associated with hints is fragile.
[1]
unreferenced object 0xffff88810e7c3000 (size 256):
comm "kworker/0:16", pid 176, jiffies
4295460353
hex dump (first 32 bytes):
00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80 .0......a.......
00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00 ..a.@...........
backtrace (crc
2544ddb9):
[<
00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0
[<
000000004d9a1ad9>] objagg_hints_get+0x42/0x390
[<
000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400
[<
0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160
[<
00000000e81fd734>] process_one_work+0x59c/0xf20
[<
00000000ceee9e81>] worker_thread+0x799/0x12c0
[<
00000000bda6fe39>] kthread+0x246/0x300
[<
0000000070056d23>] ret_from_fork+0x34/0x70
[<
00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30
Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:26:01 +0000 (17:26 +0200)]
mlxsw: spectrum_acl_tcam: Fix incorrect list API usage
Both the function that migrates all the chunks within a region and the
function that migrates all the entries within a chunk call
list_first_entry() on the respective lists without checking that the
lists are not empty. This is incorrect usage of the API, which leads to
the following warning [1].
Fix by returning if the lists are empty as there is nothing to migrate
in this case.
[1]
WARNING: CPU: 0 PID: 6437 at drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:1266 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0>
Modules linked in:
CPU: 0 PID: 6437 Comm: kworker/0:37 Not tainted
6.9.0-rc3-custom-00883-g94a65f079ef6 #39
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0x2c0
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x4a0
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/4628e9a22d1d84818e28310abbbc498e7bc31bc9.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:26:00 +0000 (17:26 +0200)]
mlxsw: spectrum_acl_tcam: Fix warning during rehash
As previously explained, the rehash delayed work migrates filters from
one region to another. This is done by iterating over all chunks (all
the filters with the same priority) in the region and in each chunk
iterating over all the filters.
When the work runs out of credits it stores the current chunk and entry
as markers in the per-work context so that it would know where to resume
the migration from the next time the work is scheduled.
Upon error, the chunk marker is reset to NULL, but without resetting the
entry markers despite being relative to it. This can result in migration
being resumed from an entry that does not belong to the chunk being
migrated. In turn, this will eventually lead to a chunk being iterated
over as if it is an entry. Because of how the two structures happen to
be defined, this does not lead to KASAN splats, but to warnings such as
[1].
Fix by creating a helper that resets all the markers and call it from
all the places the currently only reset the chunk marker. For good
measures also call it when starting a completely new rehash. Add a
warning to avoid future cases.
[1]
WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0
Modules linked in:
CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W
6.9.0-rc3-custom-00880-g29e61d91b77b #29
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_afk_encode+0x242/0x2f0
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290
mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
</TASK>
Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:25:59 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix memory leak during rehash
The rehash delayed work migrates filters from one region to another.
This is done by iterating over all chunks (all the filters with the same
priority) in the region and in each chunk iterating over all the
filters.
If the migration fails, the code tries to migrate the filters back to
the old region. However, the rollback itself can also fail in which case
another migration will be erroneously performed. Besides the fact that
this ping pong is not a very good idea, it also creates a problem.
Each virtual chunk references two chunks: The currently used one
('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the
first holds the chunk we want to migrate filters to and the second holds
the chunk we are migrating filters from.
The code currently assumes - but does not verify - that the backup chunk
does not exist (NULL) if the currently used chunk does not reference the
target region. This assumption breaks when we are trying to rollback a
rollback, resulting in the backup chunk being overwritten and leaked
[1].
Fix by not rolling back a failed rollback and add a warning to avoid
future cases.
[1]
WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20
Modules linked in:
CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G W
6.9.0-rc2-custom-00784-gc6a05c468a0b #14
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:parman_destroy+0x17/0x20
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_atcam_region_fini+0x19/0x60
mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0
mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Fixes: 843500518509 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:25:58 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Rate limit error message
In the rare cases when the device resources are exhausted it is likely
that the rehash delayed work will fail. An error message will be printed
whenever this happens which can be overwhelming considering the fact
that the work is per-region and that there can be hundreds of regions.
Fix by rate limiting the error message.
Fixes: e5e7962ee5c2 ("mlxsw: spectrum_acl: Implement region migration according to hints")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/c510763b2ebd25e7990d80183feff91cde593145.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:25:57 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash
The rehash delayed work migrates filters from one region to another
according to the number of available credits.
The migrated from region is destroyed at the end of the work if the
number of credits is non-negative as the assumption is that this is
indicative of migration being complete. This assumption is incorrect as
a non-negative number of credits can also be the result of a failed
migration.
The destruction of a region that still has filters referencing it can
result in a use-after-free [1].
Fix by not destroying the region if migration failed.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
Read of size 8 at addr
ffff8881735319e8 by task kworker/0:31/3858
CPU: 0 PID: 3858 Comm: kworker/0:31 Tainted: G W
6.9.0-rc2-custom-00782-gf2275c2157d8 #5
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
Call Trace:
<TASK>
dump_stack_lvl+0xc6/0x120
print_report+0xce/0x670
kasan_report+0xd7/0x110
mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
mlxsw_sp_acl_ctcam_entry_del+0x2e/0x70
mlxsw_sp_acl_atcam_entry_del+0x81/0x210
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3cd/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 174:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc+0x19c/0x360
mlxsw_sp_acl_tcam_region_create+0xdf/0x9c0
mlxsw_sp_acl_tcam_vregion_rehash_work+0x954/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Freed by task 7:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
poison_slab_object+0x102/0x170
__kasan_slab_free+0x14/0x30
kfree+0xc1/0x290
mlxsw_sp_acl_tcam_region_destroy+0x272/0x310
mlxsw_sp_acl_tcam_vregion_rehash_work+0x731/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/3e412b5659ec2310c5c615760dfe5eac18dd7ebd.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:25:56 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity update
The rule activity update delayed work periodically traverses the list of
configured rules and queries their activity from the device.
As part of this task it accesses the entry pointed by 'ventry->entry',
but this entry can be changed concurrently by the rehash delayed work,
leading to a use-after-free [1].
Fix by closing the race and perform the activity query under the
'vregion->lock' mutex.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
Read of size 8 at addr
ffff8881054ed808 by task kworker/0:18/181
CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted
6.9.0-rc2-custom-00781-gd5ab772d32f7 #2
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work
Call Trace:
<TASK>
dump_stack_lvl+0xc6/0x120
print_report+0xce/0x670
kasan_report+0xd7/0x110
mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
mlxsw_sp_acl_rule_activity_update_work+0x219/0x400
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 1039:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc+0x19c/0x360
mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Freed by task 1039:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
poison_slab_object+0x102/0x170
__kasan_slab_free+0x14/0x30
kfree+0xc1/0x290
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1fcce0a60b231ebeb2515d91022284ba7b4ffe7a.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:25:55 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work
The purpose of the rehash delayed work is to reduce the number of masks
(eRPs) used by an ACL region as the eRP bank is a global and limited
resource.
This is done in three steps:
1. Creating a new set of masks and a new ACL region which will use the
new masks and to which the existing filters will be migrated to. The
new region is assigned to 'vregion->region' and the region from which
the filters are migrated from is assigned to 'vregion->region2'.
2. Migrating all the filters from the old region to the new region.
3. Destroying the old region and setting 'vregion->region2' to NULL.
Only the second steps is performed under the 'vregion->lock' mutex
although its comments says that among other things it "Protects
consistency of region, region2 pointers".
This is problematic as the first step can race with filter insertion
from user space that uses 'vregion->region', but under the mutex.
Fix by holding the mutex across the entirety of the delayed work and not
only during the second step.
Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1ec1d54edf2bad0a369e6b4fa030aba64e1f124b.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 22 Apr 2024 15:25:54 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix race in region ID allocation
Region identifiers can be allocated both when user space tries to insert
a new tc filter and when filters are migrated from one region to another
as part of the rehash delayed work.
There is no lock protecting the bitmap from which these identifiers are
allocated from, which is racy and leads to bad parameter errors from the
device's firmware.
Fix by converting the bitmap to IDA which handles its own locking. For
consistency, do the same for the group identifiers that are part of the
same structure.
Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Reported-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ce494b7940cadfe84f3e18da7785b51ef5f776e3.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Mon, 22 Apr 2024 10:33:53 +0000 (12:33 +0200)]
selftests: netfilter: fix conntrack_dump_flush retval on unsupported kernel
With CONFIG_NETFILTER=n test passes instead of skip. Before:
./run_kselftest.sh -t net/netfilter:conntrack_dump_flush
[..]
# Starting 3 tests from 1 test cases.
# RUN conntrack_dump_flush.test_dump_by_zone ...
mnl_socket_open: Protocol not supported
[..]
ok 3 conntrack_dump_flush.test_flush_by_zone_default
# PASSED: 3 / 3 tests passed.
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
After:
mnl_socket_open: Protocol not supported
[..]
ok 3 conntrack_dump_flush.test_flush_by_zone_default # SKIP cannot open netlink_netfilter socket
# PASSED: 3 / 3 tests passed.
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:3 error:0
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240422103358.3511-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Mon, 22 Apr 2024 10:25:42 +0000 (12:25 +0200)]
selftests: netfilter: nft_zones_many.sh: set ct sysctl after ruleset load
nf_conntrack_udp_timeout sysctl only exist once conntrack module is loaded,
if this test runs standalone on a modular kernel sysctl setting fails,
this can result in test failure as udp conntrack entries expire too fast.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240422102546.2494-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hyunwoo Kim [Mon, 22 Apr 2024 09:37:17 +0000 (05:37 -0400)]
net: openvswitch: Fix Use-After-Free in ovs_ct_exit
Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
of ovs_ct_limit_exit, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.
To prevent this, it should be changed to hlist_for_each_entry_safe.
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/ZiYvzQN/Ry5oeFQW@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>