linux.git
19 months agonet: remove SLAB_MEM_SPREAD flag usage
Chengming Zhou [Wed, 28 Feb 2024 03:06:58 +0000 (03:06 +0000)]
net: remove SLAB_MEM_SPREAD flag usage

The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed as of v6.8-rc1, so it became a dead flag since the commit
16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the
series[1] went on to mark it obsolete to avoid confusion for users.
Here we can just remove all its users, which has no functional change.

[1] https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240228030658.3512782-1-chengming.zhou@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoMerge branch 'tools-ynl-stop-using-libmnl'
Jakub Kicinski [Wed, 28 Feb 2024 23:25:47 +0000 (15:25 -0800)]
Merge branch 'tools-ynl-stop-using-libmnl'

Jakub Kicinski says:

====================
tools: ynl: stop using libmnl

There is no strong reason to stop using libmnl in ynl but there
are a few small ones which add up.

First (as I remembered immediately after hitting send on v1),
C++ compilers do not like the libmnl for_each macros.
I haven't tried it myself, but having all the code directly
in YNL makes it easier for folks porting to C++ to modify them
and/or make YNL more C++ friendly.

Second, we do much more advanced netlink level parsing in ynl
than libmnl so it's hard to say that libmnl abstracts much from us.
The fact that this series, removing the libmnl dependency, only
adds <300 LoC shows that code savings aren't huge.
OTOH when new types are added (e.g. auto-int) we need to add
compatibility to deal with older version of libmnl (in fact,
even tho patches have been sent months ago, auto-ints are still
not supported in libmnl.git).

Thrid, the dependency makes ynl less self contained, and harder
to vendor in. Whether vendoring libraries into projects is a good
idea is a separate discussion, nonetheless, people want to do it.

Fourth, there are small annoyances with the libmnl APIs which
are hard to fix in backward-compatible ways. See the last patch
for example.

All in all, libmnl is a great library, but with all the code
generation and structured parsing, ynl is better served by going
its own way.

v1: https://lore.kernel.org/all/20240222235614.180876-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20240227223032.1835527-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: use MSG_DONTWAIT for getting notifications
Jakub Kicinski [Tue, 27 Feb 2024 22:30:32 +0000 (14:30 -0800)]
tools: ynl: use MSG_DONTWAIT for getting notifications

To stick to libmnl wrappers in the past we had to use poll()
to check if there are any outstanding notifications on the socket.
This is no longer necessary, we can use MSG_DONTWAIT.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-16-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: remove the libmnl dependency
Jakub Kicinski [Tue, 27 Feb 2024 22:30:31 +0000 (14:30 -0800)]
tools: ynl: remove the libmnl dependency

We don't use libmnl any more.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-15-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: stop using mnl socket helpers
Jakub Kicinski [Tue, 27 Feb 2024 22:30:30 +0000 (14:30 -0800)]
tools: ynl: stop using mnl socket helpers

Most libmnl socket helpers can be replaced by direct calls to
the underlying libc API. We need portid, the netlink manpage
suggests we bind() address of zero.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-14-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: switch away from MNL_CB_*
Jakub Kicinski [Tue, 27 Feb 2024 22:30:29 +0000 (14:30 -0800)]
tools: ynl: switch away from MNL_CB_*

Create a local version of the MNL_CB_* parser control values.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: switch away from mnl_cb_t
Jakub Kicinski [Tue, 27 Feb 2024 22:30:28 +0000 (14:30 -0800)]
tools: ynl: switch away from mnl_cb_t

All YNL parsing callbacks take struct ynl_parse_arg as the argument.
Make that official by using a local callback type instead of mnl_cb_t.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: stop using mnl_cb_run2()
Jakub Kicinski [Tue, 27 Feb 2024 22:30:27 +0000 (14:30 -0800)]
tools: ynl: stop using mnl_cb_run2()

There's only one set of callbacks in YNL, for netlink control
messages, and most of them are trivial. So implement the message
walking directly without depending on mnl_cb_run2().

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: use ynl_sock_read_msgs() for ACK handling
Jakub Kicinski [Tue, 27 Feb 2024 22:30:26 +0000 (14:30 -0800)]
tools: ynl: use ynl_sock_read_msgs() for ACK handling

ynl_recv_ack() is simple and it's the only user of mnl_cb_run().
Now that ynl_sock_read_msgs() exists it's actually less code
to use ynl_sock_read_msgs() instead of being special.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: wrap recv() + mnl_cb_run2() into a single helper
Jakub Kicinski [Tue, 27 Feb 2024 22:30:25 +0000 (14:30 -0800)]
tools: ynl: wrap recv() + mnl_cb_run2() into a single helper

All callers to mnl_cb_run2() call mnl_socket_recvfrom() right before.
Wrap the two in a helper, take typed arguments (struct ynl_parse_arg),
instead of hoping that all callers remember that parser error handling
requires yarg.

In case of ynl_sock_read_family() we will no longer check for kernel
returning no data, but that would be a kernel bug, not worth complicating
the code to catch this. Calling mnl_cb_run2() on an empty buffer
is legal and results in STOP (1).

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl-gen: remove unused parse code
Jakub Kicinski [Tue, 27 Feb 2024 22:30:24 +0000 (14:30 -0800)]
tools: ynl-gen: remove unused parse code

Commit f2ba1e5e2208 ("tools: ynl-gen: stop generating common notification handlers")
removed the last caller of the parse_cb_run() helper.
We no longer need to export ynl_cb_array.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: make yarg the first member of struct ynl_dump_state
Jakub Kicinski [Tue, 27 Feb 2024 22:30:23 +0000 (14:30 -0800)]
tools: ynl: make yarg the first member of struct ynl_dump_state

All YNL parsing code expects a pointer to struct ynl_parse_arg AKA yarg.
For dump was pass in struct ynl_dump_state, which works fine, because
struct ynl_dump_state and struct ynl_parse_arg have identical layout
for the members that matter.. but it's a bit hacky.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: create local ARRAY_SIZE() helper
Jakub Kicinski [Tue, 27 Feb 2024 22:30:22 +0000 (14:30 -0800)]
tools: ynl: create local ARRAY_SIZE() helper

libc doesn't have an ARRAY_SIZE() create one locally.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: create local nlmsg access helpers
Jakub Kicinski [Tue, 27 Feb 2024 22:30:21 +0000 (14:30 -0800)]
tools: ynl: create local nlmsg access helpers

Create helpers for accessing payloads of struct nlmsg.
Use them instead of the libmnl ones.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: create local for_each helpers
Jakub Kicinski [Tue, 27 Feb 2024 22:30:20 +0000 (14:30 -0800)]
tools: ynl: create local for_each helpers

Create ynl_attr_for_each*() iteration helpers.
Use them instead of the mnl ones.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: create local attribute helpers
Jakub Kicinski [Tue, 27 Feb 2024 22:30:19 +0000 (14:30 -0800)]
tools: ynl: create local attribute helpers

Don't use mnl attr helpers, we're trying to remove the libmnl
dependency. Create both signed and unsigned helpers, libmnl
had unsigned helpers, so code generator no longer needs
the mnl_type() hack.

The new helpers are written from first principles, but are
hopefully not too buggy.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: give up on libmnl for auto-ints
Jakub Kicinski [Tue, 27 Feb 2024 22:30:18 +0000 (14:30 -0800)]
tools: ynl: give up on libmnl for auto-ints

The temporary auto-int helpers are not really correct.
We can't treat signed and unsigned ints the same when
determining whether we need full 8B. I realized this
before sending the patch to add support in libmnl.
Unfortunately, that patch has not been merged,
so time to fix our local helpers. Use the mnl* name
for now, subsequent patches will address that.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agotools: ynl: protect from old OvS headers
Jakub Kicinski [Mon, 26 Feb 2024 22:58:06 +0000 (14:58 -0800)]
tools: ynl: protect from old OvS headers

Since commit 7c59c9c8f202 ("tools: ynl: generate code for ovs families")
we need relatively recent OvS headers to get YNL to compile.
Add the direct include workaround to fix compilation on less
up-to-date OSes like CentOS 9.

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240226225806.1301152-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoMerge branch 'eee-linkmode-bitmaps'
David S. Miller [Wed, 28 Feb 2024 12:18:05 +0000 (12:18 +0000)]
Merge branch 'eee-linkmode-bitmaps'

Andrew Lunn says:

====================
drivers: net: Convert EEE handling to use linkmode bitmaps

EEE has until recently been limited to lower speeds due to the use of
the legacy u32 for link speeds. This restriction has been lifted, with
the use of linkmode bitmaps, added in the following patches:

1f069de63602 ethtool: add linkmode bitmap support to struct ethtool_keee
1d756ff13da6 ethtool: add suffix _u32 to legacy bitmap members of struct ethtool_keee
285cc15cc555 ethtool: adjust struct ethtool_keee to kernel needs
0b3100bc8fa7 ethtool: switch back from ethtool_keee to ethtool_eee for ioctl
d80a52335374 ethtool: replace struct ethtool_eee with a new struct ethtool_keee on kernel side

This patchset converts the remaining MAC drivers still using the old
_u32 to link modes.

A couple of Intel drivers do odd things with EEE, setting the autoneg
bit. It is unclear why, no other driver does, ethtool does not display
it, and EEE is always negotiated. One patch in this series deletes
this code.

With all users of the legacy _u32 changed to link modes, the _u32
values are removed from keee, and support for them in the ethtool core
is removed.

---
Changes in v5:
- Restore zeroing eee_data.advertised in ax8817_178a
- Fix lp_advertised -> supported in ixgdb
- Link to v4: https://lore.kernel.org/r/20240218-keee-u32-cleanup-v4-0-71f13b7c3e60@lunn.ch

Changes in v4:
- Add missing conversion in igb
- Add missing conversion in r8152
- Add patch to remove now unused _u32 members
- Link to v3: https://lore.kernel.org/r/20240217-keee-u32-cleanup-v3-0-fcf6b62a0c7f@lunn.ch

Changes in v3:
- Add list of commits adding linkmodes to EEE to cover letter
- Fix grammar error in cover letter.
- Add Reviewed-by from Jacob Keller
- Link to v2: https://lore.kernel.org/r/20240214-keee-u32-cleanup-v2-0-4ac534b83d66@lunn.ch

Changes in v2:
- igb: Fix type 100BaseT to 1000BaseT.
- Link to v1: https://lore.kernel.org/r/20240204-keee-u32-cleanup-v1-0-fb6e08329d9a@lunn.ch
====================

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ethtool: eee: Remove legacy _u32 from keee
Andrew Lunn [Tue, 27 Feb 2024 01:29:15 +0000 (19:29 -0600)]
net: ethtool: eee: Remove legacy _u32 from keee

All MAC drivers have been converted to use the link mode members of
keee. So remove the _u32 values, and the code in the ethtool core to
convert the legacy _u32 values to link modes.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: intel: igc: Use linkmode helpers for EEE
Andrew Lunn [Tue, 27 Feb 2024 01:29:14 +0000 (19:29 -0600)]
net: intel: igc: Use linkmode helpers for EEE

Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: intel: igb: Use linkmode helpers for EEE
Andrew Lunn [Tue, 27 Feb 2024 01:29:13 +0000 (19:29 -0600)]
net: intel: igb: Use linkmode helpers for EEE

Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: intel: e1000e: Use linkmode helpers for EEE
Andrew Lunn [Tue, 27 Feb 2024 01:29:12 +0000 (19:29 -0600)]
net: intel: e1000e: Use linkmode helpers for EEE

Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: intel: i40e/igc: Remove setting Autoneg in EEE capabilities
Andrew Lunn [Tue, 27 Feb 2024 01:29:11 +0000 (19:29 -0600)]
net: intel: i40e/igc: Remove setting Autoneg in EEE capabilities

Energy Efficient Ethernet should always be negotiated with the link
peer. Don't include SUPPORTED_Autoneg in the results of get_eee() for
supported, advertised or lp_advertised, since it is
assumed. Additionally, ethtool(1) ignores the set bit, and no other
driver sets this.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ethernet: ixgbe: Convert EEE to use linkmodes
Andrew Lunn [Tue, 27 Feb 2024 01:29:10 +0000 (19:29 -0600)]
net: ethernet: ixgbe: Convert EEE to use linkmodes

Convert the tables to make use of ETHTOOL link mode bits, rather than
the old u32 SUPPORTED speeds. Make use of the linkmode helps to set
bits and compare linkmodes. As a result, the _u32 members of keee are
no longer used, a step towards removing them.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: qlogic: qede: Use linkmode helpers for EEE
Andrew Lunn [Tue, 27 Feb 2024 01:29:09 +0000 (19:29 -0600)]
net: qlogic: qede: Use linkmode helpers for EEE

Make use of the existing linkmode helpers for bit manipulation of EEE
advertise, support and link partner support. The aim is to drop the
restricted _u32 variants in the near future.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: usb: ax88179_178a: Use linkmode helpers for EEE
Andrew Lunn [Tue, 27 Feb 2024 01:29:08 +0000 (19:29 -0600)]
net: usb: ax88179_178a: Use linkmode helpers for EEE

Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: usb: r8152: Use linkmode helpers for EEE
Andrew Lunn [Tue, 27 Feb 2024 01:29:07 +0000 (19:29 -0600)]
net: usb: r8152: Use linkmode helpers for EEE

Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.

Rework determining if EEE is active to make is similar as to how
phylib decides, and make use of a phylib helper to validate if EEE is
valid in for the current link mode. This then requires that PHYLIB is
selected.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: raw: remove useless input parameter in rawv6_get/seticmpfilter
Zhengchao Shao [Tue, 27 Feb 2024 00:57:45 +0000 (08:57 +0800)]
ipv6: raw: remove useless input parameter in rawv6_get/seticmpfilter

The input parameter 'level' in rawv6_get/seticmpfilter is not used.
Therefore, remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: stmmac: dwmac-qcom-ethqos: Update link clock rate only for RGMII
Sarosh Hasan [Mon, 26 Feb 2024 09:42:26 +0000 (15:12 +0530)]
net: stmmac: dwmac-qcom-ethqos: Update link clock rate only for RGMII

Updating link clock rate for different speeds is only needed when
using RGMII, as that mode requires changing clock speed when the link
speed changes. Let's restrict updating the link clock speed in
ethqos_update_link_clk() to just RGMII. Other modes such as SGMII
only need to enable the link clock (which is already done in probe).

Signed-off-by: Sarosh Hasan <quic_sarohasa@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8775p-ride
Reviewed-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'ioam6-mcast-events'
David S. Miller [Wed, 28 Feb 2024 11:19:42 +0000 (11:19 +0000)]
Merge branch 'ioam6-mcast-events'

Justin Iurman says:

====================
ioam6: netlink multicast event

v5:
 - remove the "must be the destination" check before sending an ioam6
   event
v4:
 - rebase on top of net merge
v3:
 - patchset was mistakenly superseded due to same cover title used for
   iproute2-next equivalent patch -> resend (renamed)
v2:
 - fix warnings

Add generic netlink multicast event support to ioam6 as another solution
to share IOAM data with user space. The other one being via IPv6 raw
sockets combined with ancillary data (or packet socket, if the listener
does not need the processing of the IOAM Option-Type, since the hook is
before in that case). This patchset focuses on the IOAM Pre-allocated
Trace (the only Option-Type currently supported in the kernel), and so
on IOAM "trace" events. See an example of a consumer here [1].

  [1] https://github.com/Advanced-Observability/ioam-agent-python/blob/netlink_event/ioam-agent.py
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: exthdrs: ioam6: send trace event
Justin Iurman [Mon, 26 Feb 2024 13:14:12 +0000 (14:14 +0100)]
net: exthdrs: ioam6: send trace event

If we're processing an IOAM Pre-allocated Trace Option-Type (the only
one supported currently), then send the trace as an ioam6 event to the
netlink multicast group. This way, user space apps will be able to
collect IOAM data.

Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ioam6: multicast event
Justin Iurman [Mon, 26 Feb 2024 13:14:11 +0000 (14:14 +0100)]
net: ioam6: multicast event

Add a multicast group to the ioam6 generic netlink family and provide
ioam6_event() to send an ioam6 event to the multicast group.

Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agouapi: ioam6: API for netlink multicast events
Justin Iurman [Mon, 26 Feb 2024 13:14:10 +0000 (14:14 +0100)]
uapi: ioam6: API for netlink multicast events

Add new api to support ioam6 events for generic netlink multicast. A
first "trace" event is added to the list of ioam6 events, which will
represent an IOAM Pre-allocated Trace Option-Type. It provides another
solution to share IOAM data with user space.

Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agodt-bindings: net: ethernet-controller: drop redundant type from label
Krzysztof Kozlowski [Mon, 26 Feb 2024 12:29:13 +0000 (13:29 +0100)]
dt-bindings: net: ethernet-controller: drop redundant type from label

dtschema defines label as string, so $ref in other bindings is
redundant.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'tcp-rcv-drop-reasons'
David S. Miller [Wed, 28 Feb 2024 10:39:22 +0000 (10:39 +0000)]
Merge branch 'tcp-rcv-drop-reasons'

Jason Xing says:

====================
introduce drop reasons for tcp receive path

When I was debugging the reason about why the skb should be dropped in
syn cookie mode, I found out that this NOT_SPECIFIED reason is too
general. Thus I decided to refine it.

v10
Link: https://lore.kernel.org/netdev/20240223193321.6549-1-kuniyu@amazon.com/
1. fix three nit problems (Kuniyuki)
2. add reviewed-by tag (Kuniyuki)

v9
Link: https://lore.kernel.org/netdev/20240222113003.67558-1-kerneljasonxing@gmail.com/
1. nit: remove one unneeded 'else' (David)
2. add reviewed-by tags (Eric, David)

v8
Link: https://lore.kernel.org/netdev/20240221025732.68157-1-kerneljasonxing@gmail.com/
1. refine part of codes in patch [03/10] and patch [10/10] (Eric)
2. squash patch [11/11] in the last version into patch [10/11] (Eric)
3. add reviewed-by tags (Eric)

v7
Link: https://lore.kernel.org/all/20240219032838.91723-1-kerneljasonxing@gmail.com/
1. fix some misspelled problem (Kuniyuki)
2. remove redundant codes in tcp_v6_do_rcv() (Kuniyuki)
3. add reviewed-by tag in patch [02/11] (Kuniyuki)

v6
Link: https://lore.kernel.org/all/c987d2c79e4a4655166eb8eafef473384edb37fb.camel@redhat.com/
Link: https://lore.kernel.org/all/CAL+tcoAgSjwsmFnDh_Gs9ZgMi-y5awtVx+4VhJPNRADjo7LLSA@mail.gmail.com/
1. Take one case into consideration in tcp_v6_do_rcv(), behave like old
days, or else it will trigger errors (Paolo).
2. Extend NO_SOCKET reason to consider two more reasons for request
socket and child socket.

v5:
Link: https://lore.kernel.org/netdev/20240213134205.8705-1-kerneljasonxing@gmail.com/
Link: https://lore.kernel.org/netdev/20240213140508.10878-1-kerneljasonxing@gmail.com/
1. Use SKB_DROP_REASON_IP_OUTNOROUTES instead of introducing a new
   one (Eric, David)
2. Reuse SKB_DROP_REASON_NOMEM to handle failure of request socket
   allocation (Eric)
3. Reuse NO_SOCKET instead of introducing COOKIE_NOCHILD
4. avoid duplication of these opt_skb tests/actions (Eric)
5. Use new name (TCP_ABORT_ON_DATA) for readability (David)
6. Reuse IP_OUTNOROUTES instead of INVALID_DST (Eric)

---
HISTORY
This series is combined with 2 series sent before suggested by Jakub. So
I'm going to separately write changelogs for each of them.

PATCH 1/11 - 5/11
Link: https://lore.kernel.org/netdev/20240213134205.8705-1-kerneljasonxing@gmail.com/
Summary
1. introduce all the dropreasons we need, [1/11] patch.
2. use new dropreasons in ipv4 cookie check, [2/11],[3/11] patch.
3. use new dropreasons ipv6 cookie check, [4/11],[5/11] patch.

v4:
Link: https://lore.kernel.org/netdev/20240212172302.3f95e454@kernel.org/
1. Fix misspelled name in Kdoc as suggested by Jakub.

v3:
Link: https://lore.kernel.org/all/CANn89iK40SoyJ8fS2U5kp3pDruo=zfQNPL-ppOF+LYaS9z-MVA@mail.gmail.com/
1. Split that patch into some smaller ones as suggested by Eric.

v2:
Link: https://lore.kernel.org/all/20240204104601.55760-1-kerneljasonxing@gmail.com/
1. change the title of 2/2 patch.
2. fix some warnings checkpatch tool showed before.
3. use return value instead of adding more parameters suggested by Eric.

PATCH 6/11 - 11/11
Link: https://lore.kernel.org/netdev/20240213140508.10878-1-kerneljasonxing@gmail.com/
v4:
Link: https://lore.kernel.org/netdev/CANn89iJar+H3XkQ8HpsirH7b-_sbFe9NBUdAAO3pNJK3CKr_bg@mail.gmail.com/
Link: https://lore.kernel.org/netdev/20240213131205.4309-1-kerneljasonxing@gmail.com/
Already got rid of @acceptable in tcp_rcv_state_process(), so I need to
remove *TCP_CONNREQNOTACCEPTABLE related codes which I wrote in the v3
series.

v3:
Link: https://lore.kernel.org/all/CANn89iK40SoyJ8fS2U5kp3pDruo=zfQNPL-ppOF+LYaS9z-MVA@mail.gmail.com/
1. Split that patch into some smaller ones as suggested by Eric.

v2:
Link: https://lore.kernel.org/all/20240204104601.55760-1-kerneljasonxing@gmail.com/
1. change the title of 2/2 patch.
2. fix some warnings checkpatch tool showed before.
3. use return value instead of adding more parameters suggested by Eric.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: make dropreason in tcp_child_process() work
Jason Xing [Mon, 26 Feb 2024 03:22:27 +0000 (11:22 +0800)]
tcp: make dropreason in tcp_child_process() work

It's time to let it work right now. We've already prepared for this:)

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: make the dropreason really work when calling tcp_rcv_state_process()
Jason Xing [Mon, 26 Feb 2024 03:22:26 +0000 (11:22 +0800)]
tcp: make the dropreason really work when calling tcp_rcv_state_process()

Update three callers including both ipv4 and ipv6 and let the dropreason
mechanism work in reality.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: add dropreasons in tcp_rcv_state_process()
Jason Xing [Mon, 26 Feb 2024 03:22:25 +0000 (11:22 +0800)]
tcp: add dropreasons in tcp_rcv_state_process()

In this patch, I equipped this function with more dropreasons, but
it still doesn't work yet, which I will do later.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: add more specific possible drop reasons in tcp_rcv_synsent_state_process()
Jason Xing [Mon, 26 Feb 2024 03:22:24 +0000 (11:22 +0800)]
tcp: add more specific possible drop reasons in tcp_rcv_synsent_state_process()

This patch does two things:
1) add two more new reasons
2) only change the return value(1) to various drop reason values
for the future use

For now, we still cannot trace those two reasons. We'll implement the full
function in the subsequent patch in this series.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: introduce dropreasons in receive path
Jason Xing [Mon, 26 Feb 2024 03:22:23 +0000 (11:22 +0800)]
tcp: introduce dropreasons in receive path

Soon later patches can use these relatively more accurate
reasons to recognise and find out the cause.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: use drop reasons in cookie check for ipv6
Jason Xing [Mon, 26 Feb 2024 03:22:22 +0000 (11:22 +0800)]
tcp: use drop reasons in cookie check for ipv6

Like what I did to ipv4 mode, refine this part: adding more drop
reasons for better tracing.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: directly drop skb in cookie check for ipv6
Jason Xing [Mon, 26 Feb 2024 03:22:21 +0000 (11:22 +0800)]
tcp: directly drop skb in cookie check for ipv6

Like previous patch does, only moving skb drop logical code to
cookie_v6_check() for later refinement.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: use drop reasons in cookie check for ipv4
Jason Xing [Mon, 26 Feb 2024 03:22:20 +0000 (11:22 +0800)]
tcp: use drop reasons in cookie check for ipv4

Now it's time to use the prepared definitions to refine this part.
Four reasons used might enough for now, I think.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: directly drop skb in cookie check for ipv4
Jason Xing [Mon, 26 Feb 2024 03:22:19 +0000 (11:22 +0800)]
tcp: directly drop skb in cookie check for ipv4

Only move the skb drop from tcp_v4_do_rcv() to cookie_v4_check() itself,
no other changes made. It can help us refine the specific drop reasons
later.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agotcp: add a dropreason definitions and prepare for cookie check
Jason Xing [Mon, 26 Feb 2024 03:22:18 +0000 (11:22 +0800)]
tcp: add a dropreason definitions and prepare for cookie check

Adding one drop reason to detect the condition of skb dropped
because of hook points in cookie check and extending NO_SOCKET
to consider another two cases can be used later.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: make SK_MEMORY_PCPU_RESERV tunable
Adam Li [Mon, 26 Feb 2024 02:24:52 +0000 (02:24 +0000)]
net: make SK_MEMORY_PCPU_RESERV tunable

This patch adds /proc/sys/net/core/mem_pcpu_rsv sysctl file,
to make SK_MEMORY_PCPU_RESERV tunable.

Commit 3cd3399dd7a8 ("net: implement per-cpu reserves for
memory_allocated") introduced per-cpu forward alloc cache:

"Implement a per-cpu cache of +1/-1 MB, to reduce number
of changes to sk->sk_prot->memory_allocated, which
would otherwise be cause of false sharing."

sk_prot->memory_allocated points to global atomic variable:
atomic_long_t tcp_memory_allocated ____cacheline_aligned_in_smp;

If increasing the per-cpu cache size from 1MB to e.g. 16MB,
changes to sk->sk_prot->memory_allocated can be further reduced.
Performance may be improved on system with many cores.

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'dsa-realtek-reset'
David S. Miller [Wed, 28 Feb 2024 08:21:42 +0000 (08:21 +0000)]
Merge branch 'dsa-realtek-reset'

Luiz Angelo Daros de Luca says:

====================
net: dsa: realtek: support reset controller and update docs

The driver previously supported reset pins using GPIO, but it lacked
support for reset controllers. Although a reset method is generally not
required, the driver fails to detect the switch if the reset was kept
asserted by a previous driver.

This series adds support to reset a Realtek switch using a reset
controller. It also updates the binding documentation to remove the
requirement of a reset method and to add the new reset controller
property.

It was tested on a TL-WR1043ND v1 router (rtl8366rb via SMI).

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
---
Changes in v5:
- Fixed error checking logic when reset controller (de)assert fails
- Link to v4: https://lore.kernel.org/r/20240219-realtek-reset-v4-0-858b82a29503@gmail.com

Changes in v4:
- do not test for priv->reset,priv->reset_ctl
- updated commit message
- Link to v3: https://lore.kernel.org/r/20240213-realtek-reset-v3-0-37837e574713@gmail.com

Changes in v3:
- Rebased on the Realtek DSA driver refactoring (08f627164126)
- Dropped the reset controller example in bindings
- Used %pe in error printing
- Linked to v2: https://lore.kernel.org/r/20231027190910.27044-1-luizluca@gmail.com/

Changes in v2:
- Introduced a dedicated commit for removing the reset-gpios requirement
- Placed binding patches before code changes
- Removed the 'reset-names' property
- Moved the example from the commit message to realtek.yaml
- Split the reset function into _assert/_deassert variants
- Modified reset functions to return a warning instead of a value
- Utilized devm_reset_control_get_optional to prevent failure when the
  reset control is missing
- Used 'true' and 'false' for boolean values
- Removed the CONFIG_RESET_CONTROLLER check as stub methods are
  sufficient when undefined
- Linked to v1: https://lore.kernel.org/r/20231024205805.19314-1-luizluca@gmail.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: dsa: realtek: support reset controller
Luiz Angelo Daros de Luca [Sun, 25 Feb 2024 16:29:55 +0000 (13:29 -0300)]
net: dsa: realtek: support reset controller

Add support for resetting the device using a reset controller,
complementing the existing GPIO reset functionality (reset-gpios).

Although the reset is optional and the driver performs a soft reset
during setup, if the initial reset pin state was asserted, the driver
will not detect the device until the reset is deasserted.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agodt-bindings: net: dsa: realtek: add reset controller
Luiz Angelo Daros de Luca [Sun, 25 Feb 2024 16:29:54 +0000 (13:29 -0300)]
dt-bindings: net: dsa: realtek: add reset controller

Realtek switches can use a reset controller instead of reset-gpios.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Cc: devicetree@vger.kernel.org
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agodt-bindings: net: dsa: realtek: reset-gpios is not required
Luiz Angelo Daros de Luca [Sun, 25 Feb 2024 16:29:53 +0000 (13:29 -0300)]
dt-bindings: net: dsa: realtek: reset-gpios is not required

The 'reset-gpios' should not be mandatory. although they might be
required for some devices if the switch reset was left asserted by a
previous driver, such as the bootloader.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Cc: devicetree@vger.kernel.org
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agobonding: 802.3ad replace MAC_ADDRESS_EQUAL with __agg_has_partner
Jones Syue 薛懷宗 [Mon, 26 Feb 2024 02:24:52 +0000 (02:24 +0000)]
bonding: 802.3ad replace MAC_ADDRESS_EQUAL with __agg_has_partner

Replace macro MAC_ADDRESS_EQUAL() for null_mac_addr checking with inline
function__agg_has_partner(). When MAC_ADDRESS_EQUAL() is verifiying
aggregator's partner mac addr with null_mac_addr, means that seeing if
aggregator has a valid partner or not. Using __agg_has_partner() makes it
more clear to understand.

In ad_port_selection_logic(), since aggregator->partner_system and
port->partner_oper.system has been compared first as a prerequisite, it is
safe to replace the upcoming MAC_ADDRESS_EQUAL() for null_mac_addr checking
with __agg_has_partner().

Delete null_mac_addr, which is not required anymore in bond_3ad.c, since
all references to it are gone.

Signed-off-by: Jones Syue <jonessyue@qnap.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/SI2PR04MB5097BCA8FF2A2F03D9A5A3EEDC5A2@SI2PR04MB5097.apcprd04.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: wwan: t7xx: Prefer struct_size over open coded arithmetic
Erick Archer [Sat, 24 Feb 2024 18:19:32 +0000 (19:19 +0100)]
net: wwan: t7xx: Prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].

As the "port_prox" variable is a pointer to "struct port_proxy" and
this structure ends in a flexible array:

struct port_proxy {
[...]
struct t7xx_port ports[];
};

the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the argument "size + size * count" in the
devm_kzalloc() function.

This way, the code is more readable and safer.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://lore.kernel.org/r/20240224181932.2720-1-erick.archer@gmx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoipv6: raw: remove useless input parameter in rawv6_err
Zhengchao Shao [Sat, 24 Feb 2024 08:41:21 +0000 (16:41 +0800)]
ipv6: raw: remove useless input parameter in rawv6_err

The input parameter 'opt' in rawv6_err() is not used. Therefore, remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240224084121.2479603-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonetlink: use kvmalloc() in netlink_alloc_large_skb()
Eric Dumazet [Sat, 24 Feb 2024 09:06:30 +0000 (09:06 +0000)]
netlink: use kvmalloc() in netlink_alloc_large_skb()

This is a followup of commit 234ec0b6034b ("netlink: fix potential
sleeping issue in mqueue_flush_file"), because vfree_atomic()
overhead is unfortunate for medium sized allocations.

1) If the allocation is smaller than PAGE_SIZE, do not bother
   with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
   while NLMSG_GOODSIZE is smaller than 8KB.

2) Use kvmalloc(), which might allocate one high order page
   instead of vmalloc if memory is not too fragmented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agobnxt_en: fix accessing vnic_info before allocating it
Alexander Lobakin [Mon, 26 Feb 2024 14:49:11 +0000 (15:49 +0100)]
bnxt_en: fix accessing vnic_info before allocating it

bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration
block, but allocates it much later. As a result, the following crash
happens on my setup:

 BUG: kernel NULL pointer dereference, address: 0000000000000090
 fbcon: Taking over console
 #PF: supervisor write access in kernel mode
 #PF: error_code (0x0002) - not-present page
 PGD 12f382067 P4D 0
 Oops: 8002 [#1] PREEMPT SMP NOPTI
 CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ #49
 Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023
 RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en]
 Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74
 RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000
 RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540
 FS:  000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CRO: 0000000080050033
 CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0
 DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554

 Call Trace:
 <TASK>
 ? __die_body+0x68/0xb0
 ? page_fault_oops+0x3a6/0x400
 ? exc_page_fault+0x7a/0x1b0
 ? asm_exc_page_fault+0x26/8x30
 ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en]
 ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en]
 _bnxt_open_nic+0x198/0xa50 [bnxt_en]
 ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en]
 bnxt_open+0xeb/0x1b0 [bnxt_en]
 _dev_open+0x12e/0x1f0
 _dev_change_flags+0xb0/0x200
 dev_change_flags+0x25/0x60
 do_setlink+0x463/0x1260
 ? sock_def_readable+0x14/0xc0
 ? rtnl_getlink+0x4b9/0x590
 ? _nla_validate_parse+0x91/0xfa0
 rtnl_newlink+0xbac/0xe40
 <...>

Don't create a variable and dereference the first array member directly
since it's used only once in the code.

Fixes: ef4ee64e9990 ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index")
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240226144911.1297336-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoselftests: netdevsim: be less selective for FW for the devlink test
Jakub Kicinski [Sat, 24 Feb 2024 05:06:58 +0000 (21:06 -0800)]
selftests: netdevsim: be less selective for FW for the devlink test

Commit 6151ff9c7521 ("selftests: netdevsim: use suitable existing dummy
file for flash test") introduced a nice trick to the devlink flashing
test. Instead of user having to create a file under /lib/firmware
we just pick the first one that already exists.

Sadly, in AWS Linux there are no files directly under /lib/firmware,
only in subdirectories. Don't limit the search to -maxdepth 1.
We can use the %P print format to get the correct path for files
inside subdirectories:

$ find /lib/firmware -type f -printf '%P\n' | head -1
intel-ucode/06-1a-05

The full path is /lib/firmware/intel-ucode/06-1a-05

This works in GNU find, busybox doesn't have printf at all,
so we're not making it worse.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240224050658.930272-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: stmmac: mmc_core: Drop interrupt registers from stats
Jesper Nilsson [Fri, 23 Feb 2024 20:37:01 +0000 (21:37 +0100)]
net: stmmac: mmc_core: Drop interrupt registers from stats

The MMC IPC interrupt status and interrupt mask registers are
of little use as Ethernet statistics, but incrementing counters
based on the current interrupt and interrupt mask registers
makes them actively misleading.

For example, if the interrupt mask is set to 0x08420842,
the current code will increment by that amount each iteration,
leading to the following sequence of nonsense:

mmc_rx_ipc_intr_mask: 969816526
mmc_rx_ipc_intr_mask: 1108361744

These registers have been included in the Ethernet statistics
since the first version of MMC back in 2011 (commit 1c901a46d57).
That commit also mentions the MMC interrupts as
"something to add later (if actually useful)".

If the registers are actually useful, they should probably
be part of the Ethernet register dump instead of statistics,
but for now, drop the counters for mmc_rx_ipc_intr and
mmc_rx_ipc_intr_mask completely.

Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240223-stmmac_stats-v3-1-5d483c2a071a@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: ethernet: adi: adin1110: Reduce the MDIO_TRDONE poll interval
Ciprian Regus [Fri, 23 Feb 2024 16:21:27 +0000 (18:21 +0200)]
net: ethernet: adi: adin1110: Reduce the MDIO_TRDONE poll interval

In order to do a clause 22 access to the PHY registers of the ADIN1110,
we have to write the MDIO frame to the ADIN1110_MDIOACC register, and
then poll the MDIO_TRDONE bit (for a 1) in the same register. The
device will set this bit to 1 once the internal MDIO transaction is
done. In practice, this bit takes ~50 - 60 us to be set.

The first attempt to poll the bit is right after the ADIN1110_MDIOACC
register is written, so it will always be read as 0. The next check will
only be done after 10 ms, which will result in the MDIO transactions
taking a long time to complete. Reduce this polling interval to 100 us.
Since this interval is short enough, switch the poll function to
readx_poll_timeout_atomic() instead.

Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Signed-off-by: Ciprian Regus <ciprian.regus@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240223162129.154114-1-ciprian.regus@analog.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agoMerge branch 'net-ipa-don-t-abort-system-suspend'
Paolo Abeni [Tue, 27 Feb 2024 10:24:06 +0000 (11:24 +0100)]
Merge branch 'net-ipa-don-t-abort-system-suspend'

Alex Elder says:

====================
net: ipa: don't abort system suspend

Currently the IPA code aborts an in-progress system suspend if an
IPA interrupt arrives before the suspend completes.  There is no
need to do that though, because the IPA driver handles a forced
suspend correctly, quiescing any hardware activity before finally
turning off clocks and interconnects.

This series drops the call to pm_wakeup_dev_event() if an IPA
SUSPEND interrupt arrives during system suspend.  Doing this
makes the two remaining IPA power flags unnecessary, and allows
some additional code to be cleaned up--and best of all, removed.
The result is much simpler (and I'm really glad not to be using
these flags any more).

The first patch implements the main change.  The second and
third remove the flags that were used to determine whether to
call pm_wakeup_dev_event().  The next two remove a function that
becomes a trivial wrapper, and the last one just avoids writing
a register unnecessarily.

Note that the first two patches will have checkpatch warnings,
because checkpatch disagrees with my compiler on what to do when
a block contains only a semicolon.  I went with what the compiler
recommends.

clang says: warning: suggest braces around empty body
checkpatch: WARNING: braces {} are not necessary for single statement blocks

====================

Link: https://lore.kernel.org/r/20240223133930.582041-1-elder@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: ipa: don't bother zeroing an already zero register
Alex Elder [Fri, 23 Feb 2024 13:39:30 +0000 (07:39 -0600)]
net: ipa: don't bother zeroing an already zero register

In ipa_interrupt_suspend_clear_all(), if the SUSPEND_INFO register
read contains no set bits, there's no interrupt condition to clear.
Skip the write to the clear register in that case.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: ipa: kill ipa_power_suspend_handler()
Alex Elder [Fri, 23 Feb 2024 13:39:29 +0000 (07:39 -0600)]
net: ipa: kill ipa_power_suspend_handler()

Now that ipa_power_suspend_handler() is a trivial wrapper around
ipa_interrupt_suspend_clear_all(), we can open-code it in the one
place it's used, and get rid of the function.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: ipa: move ipa_interrupt_suspend_clear_all() up
Alex Elder [Fri, 23 Feb 2024 13:39:28 +0000 (07:39 -0600)]
net: ipa: move ipa_interrupt_suspend_clear_all() up

The next patch makes ipa_interrupt_suspend_clear_all() static,
calling it only within "ipa_interrupt.c".  Move its definition
higher in the file so no declaration is needed.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: ipa: kill the IPA_POWER_FLAG_RESUMED flag
Alex Elder [Fri, 23 Feb 2024 13:39:27 +0000 (07:39 -0600)]
net: ipa: kill the IPA_POWER_FLAG_RESUMED flag

The IPA_POWER_FLAG_RESUMED was originally used to avoid calling
pm_wakeup_dev_event() more than once when handling a SUSPEND
interrupt.  This call is no longer made, so there' no need for the
flag, so get rid of it.

That leaves no more IPA power flags usefully defined, so just get
rid of the bitmap in the IPA power structure and the definition of
the ipa_power_flag enumerated type.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: ipa: kill IPA_POWER_FLAG_SYSTEM
Alex Elder [Fri, 23 Feb 2024 13:39:26 +0000 (07:39 -0600)]
net: ipa: kill IPA_POWER_FLAG_SYSTEM

The SYSTEM IPA power flag is set, cleared, and tested.  But nothing
happens based on its value when tested, so it serves no purpose.
Get rid of this flag.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: ipa: don't bother aborting system resume
Alex Elder [Fri, 23 Feb 2024 13:39:25 +0000 (07:39 -0600)]
net: ipa: don't bother aborting system resume

The IPA interrupt can fire if there is data to be delivered to a GSI
channel that is suspended.  This condition occurs in three scenarios.

First, runtime power management automatically suspends the IPA
hardware after half a second of inactivity.  This has nothing
to do with system suspend, so a SYSTEM IPA power flag is used to
avoid calling pm_wakeup_dev_event() when runtime suspended.

Second, if the system is suspended, the receipt of an IPA interrupt
should trigger a system resume.  Configuring the IPA interrupt for
wakeup accomplishes this.

Finally, if system suspend is underway and the IPA interrupt fires,
we currently call pm_wakeup_dev_event() to abort the system suspend.

The IPA driver correctly handles quiescing the hardware before
suspending it, so there's really no need to abort a suspend in
progress in the third case.  We can simply quiesce and suspend
things, and be done.

Incoming data can still wake the system after it's suspended.
The IPA interrupt has wakeup mode enabled, so if it fires *after*
we've suspended, it will trigger a wakeup (if not disabled via
sysfs).

Stop calling pm_wakeup_dev_event() to abort a system suspend in
progress in ipa_power_suspend_handler().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: phy: simplify genphy_c45_ethtool_set_eee
Heiner Kallweit [Tue, 20 Feb 2024 21:55:38 +0000 (22:55 +0100)]
net: phy: simplify genphy_c45_ethtool_set_eee

Simplify the function, no functional change intended.

- Remove not needed variable unsupp, I think code is even better
  readable now.
- Move setting phydev->eee_enabled out of the if clause
- Simplify return value handling

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/442277c7-7431-4542-80b5-1d3d691714d7@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agoMerge branch 'mptcp-various-small-improvements'
Jakub Kicinski [Tue, 27 Feb 2024 02:42:14 +0000 (18:42 -0800)]
Merge branch 'mptcp-various-small-improvements'

Matthieu Baerts says:

====================
mptcp: various small improvements

This series brings various small improvements to MPTCP and its
selftests:

Patch 1 prints an error if there are duplicated subtests names. It is
important to have unique (sub)tests names in TAP, because some CI
environments drop (sub)tests with duplicated names.

Patch 2 is a preparation for patches 3 and 4, which check the protocol
in tcp_sk() and mptcp_sk() with DEBUG_NET, only in code from net/mptcp/.
We recently had the case where an MPTCP socket was wrongly treated as a
TCP one, and fuzzers and static checkers never spot the issue. This
would prevent such issues in the future.

Patches 5 to 7 are some cleanup for the MPTCP selftests. These patches
are not supposed to change the behaviour.

Patch 8 sets the poll timeout in diag selftest to the same value as the
one used in the other selftests.
====================

Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-0-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoselftests: mptcp: diag: change timeout_poll to 30
Geliang Tang [Fri, 23 Feb 2024 20:18:00 +0000 (21:18 +0100)]
selftests: mptcp: diag: change timeout_poll to 30

Even if it is set to 100ms from the beginning with commit
df62f2ec3df6 ("selftests/mptcp: add diag interface tests"), there is
no reason not to have it to 30ms like all the other tests. "diag.sh" is
not supposed to be slower than the other ones.

To maintain consistency with other scripts, this patch changes it to 30.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoselftests: mptcp: join: change capture/checksum as bool
Geliang Tang [Fri, 23 Feb 2024 20:17:59 +0000 (21:17 +0100)]
selftests: mptcp: join: change capture/checksum as bool

To maintain consistency with other scripts, this patch changes vars
'capture' and 'checksum' as bool vars in mptcp_join.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-7-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoselftests: mptcp: simult flows: define missing vars
Geliang Tang [Fri, 23 Feb 2024 20:17:58 +0000 (21:17 +0100)]
selftests: mptcp: simult flows: define missing vars

The variables 'large', 'small', 'sout', 'cout', 'capout' and 'size' are
used in multiple functions, so they should be clearly defined as global
variables at the top of the file.

This patch redefines them at the beginning of simult_flows.sh.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-6-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoselftests: mptcp: netlink: drop duplicate var ret
Geliang Tang [Fri, 23 Feb 2024 20:17:57 +0000 (21:17 +0100)]
selftests: mptcp: netlink: drop duplicate var ret

The variable 'ret' are defined twice in pm_netlink.sh. This patch drops
this duplicate one that has been defined from the beginning, with
commit eedbc685321b ("selftests: add PM netlink functional tests")

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-5-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agomptcp: check the protocol in mptcp_sk() with DEBUG_NET
Matthieu Baerts (NGI0) [Fri, 23 Feb 2024 20:17:56 +0000 (21:17 +0100)]
mptcp: check the protocol in mptcp_sk() with DEBUG_NET

Fuzzers and static checkers might not detect when mptcp_sk() is used
with a non mptcp_sock structure.

This is similar to the parent commit, where it is easy to use mptcp_sk()
with a TCP sock, e.g. with a subflow sk.

So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell
kernel devs when a non-MPTCP socket is being used as an MPTCP one.
'mptcp_sk()' macro is then defined differently: with an extra WARN to
complain when an unexpected socket is being used.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-4-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agomptcp: check the protocol in tcp_sk() with DEBUG_NET
Matthieu Baerts (NGI0) [Fri, 23 Feb 2024 20:17:55 +0000 (21:17 +0100)]
mptcp: check the protocol in tcp_sk() with DEBUG_NET

Fuzzers and static checkers might not detect when tcp_sk() is used with
a non tcp_sock structure.

This kind of mistake already happened a few times with MPTCP: when
wrongly using TCP-specific helpers with mptcp_sock pointers. On the
other hand, there are many 'tcp_xxx()' helpers that are taking a 'struct
sock' pointer as arguments, and some of them are only looking at fields
from 'struct sock', and nothing from 'struct tcp_sock'. It is then
tempting to use them with a 'struct mptcp_sock'.

So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell
kernel devs when a non-TCP socket is being used as a TCP one. 'tcp_sk()'
macro is then re-defined to add a WARN when an unexpected socket is
being used.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-3-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agomptcp: token kunit: set protocol
Matthieu Baerts (NGI0) [Fri, 23 Feb 2024 20:17:54 +0000 (21:17 +0100)]
mptcp: token kunit: set protocol

As it would be done when initiating an MPTCP sock.

This is not strictly needed for this test, but it will be when a later
patch will check if the right protocol is being used when calling
mptcp_sk().

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-2-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoselftests: mptcp: lib: catch duplicated subtest entries
Matthieu Baerts (NGI0) [Fri, 23 Feb 2024 20:17:53 +0000 (21:17 +0100)]
selftests: mptcp: lib: catch duplicated subtest entries

It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.

When adding a new subtest entry, an error message is printed in case of
duplicated entries. If there were duplicated entries and if all features
were expected to work, the script exits with an error at the end, after
having printed all subtests in the TAP format. Thanks to that, the MPTCP
CI will catch such issues early.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-1-b6c8a10396bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoipv6: anycast: complete RCU handling of struct ifacaddr6
Eric Dumazet [Fri, 23 Feb 2024 20:10:54 +0000 (20:10 +0000)]
ipv6: anycast: complete RCU handling of struct ifacaddr6

struct ifacaddr6 are already freed after RCU grace period.

Add __rcu qualifier to aca_next pointer, and idev->ac_list

Add relevant rcu_assign_pointer() and dereference accessors.

ipv6_chk_acast_dev() no longer needs to acquire idev->lock.

/proc/net/anycast6 is now purely RCU protected, it no
longer acquires idev->lock.

Similarly in6_dump_addrs() can use RCU protection to iterate
through anycast addresses. It was relying on a mixture of RCU
and RTNL but next patches will get rid of RTNL there.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodt-bindings: net: cdns,macb: add sam9x7 ethernet interface
Varshini Rajendran [Fri, 23 Feb 2024 17:22:28 +0000 (22:52 +0530)]
dt-bindings: net: cdns,macb: add sam9x7 ethernet interface

Add documentation for sam9x7 ethernet interface.

Signed-off-by: Varshini Rajendran <varshini.rajendran@microchip.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240223172228.671553-1-varshini.rajendran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet/vsockmon: Do not set zeroed statistics
Breno Leitao [Fri, 23 Feb 2024 11:58:38 +0000 (03:58 -0800)]
net/vsockmon: Do not set zeroed statistics

Do not set rtnl_link_stats64 fields to zero, since they are zeroed
before ops->ndo_get_stats64 is called in core dev_get_stats() function.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240223115839.3572852-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet/vsockmon: Leverage core stats allocator
Breno Leitao [Fri, 23 Feb 2024 11:58:37 +0000 (03:58 -0800)]
net/vsockmon: Leverage core stats allocator

With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the vsockmon driver and leverage the network
core allocation instead.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20240223115839.3572852-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoMerge branch 'pcs-xpcs-cleanups'
David S. Miller [Mon, 26 Feb 2024 13:09:09 +0000 (13:09 +0000)]
Merge branch 'pcs-xpcs-cleanups'

Serge Semin says:

====================
net: pcs: xpcs: Cleanups before adding MMIO dev support

As stated in the subject this series is a short prequel before submitting
the main patches adding the memory-mapped DW XPCS support to the DW XPCS
and DW *MAC (STMMAC) drivers. Originally it was a part of the bigger
patchset (see the changelog v2 link below) but was detached to a
preparation set to shrink down the main series thus simplifying it'
review.

The patchset' content is straightforward: drop the redundant sentinel
entry and the header files; return EINVAL errno from the soft-reset method
and make sure that the interface validation method return EINVAL straight
away if the requested interface isn't supported by the XPCS device
instance. All of these changes are required to simplify the changes being
introduced a bit later in the framework of the memory-mapped DW XPCS
support patches.

Link: https://lore.kernel.org/netdev/20231205103559.9605-1-fancer.lancer@gmail.com
Changelog v2:
- Move the preparation patches to a separate series.
- Simplify the commit messages (@Russell, @Vladimir).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: pcs: xpcs: Explicitly return error on caps validation
Serge Semin [Thu, 22 Feb 2024 17:58:23 +0000 (20:58 +0300)]
net: pcs: xpcs: Explicitly return error on caps validation

If an unsupported interface is passed to the PCS validation callback there
is no need in further link-modes calculations since the resultant array
will be initialized with zeros which will be perceived by the phylink
subsystem as error anyway (see phylink_validate_mac_and_pcs()). Instead
let's explicitly return the -EINVAL error to inform the caller about the
unsupported interface as it's done in the rest of the pcs_validate
callbacks.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: pcs: xpcs: Return EINVAL in the internal methods
Serge Semin [Thu, 22 Feb 2024 17:58:22 +0000 (20:58 +0300)]
net: pcs: xpcs: Return EINVAL in the internal methods

In particular the xpcs_soft_reset() and xpcs_do_config() functions
currently return -1 if invalid auto-negotiation mode is specified. That
value might be then passed to the generic kernel subsystems which require
a standard kernel errno value. Even though the erroneous conditions are
very specific (memory corruption or buggy driver implementation) using a
hard-coded -1 literal doesn't seem correct anyway especially when it comes
to passing it higher to the network subsystem or printing to the system
log.  Convert the hard-coded error values to -EINVAL then.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: pcs: xpcs: Drop redundant workqueue.h include directive
Serge Semin [Thu, 22 Feb 2024 17:58:21 +0000 (20:58 +0300)]
net: pcs: xpcs: Drop redundant workqueue.h include directive

There is nothing CM workqueue-related in the driver. So the respective
include directive can be dropped.

While at it add an empty line delimiter between the generic and local path
include directives to visually separate them.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: pcs: xpcs: Drop sentinel entry from 2500basex ifaces list
Serge Semin [Thu, 22 Feb 2024 17:58:20 +0000 (20:58 +0300)]
net: pcs: xpcs: Drop sentinel entry from 2500basex ifaces list

There are currently only two methods (xpcs_find_compat() and
xpcs_get_interfaces()) defined in the driver which loop over the available
interfaces. All of them rely on the xpcs_compat::num_interfaces field
value to get the total number of supported interfaces. Thus the interface
arrays are supposed to be filled with actual interface IDs and there is no
need in the dummy terminating ID placed at the end of the arrays.

Based on the above drop the PHY_INTERFACE_MODE_MAX entry from the
xpcs_2500basex_interfaces array and the PHY_INTERFACE_MODE_MAX-based
conditional statement from the xpcs_get_interfaces() method as redundant.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'rtnetlink-reduce-rtnl-pressure'
David S. Miller [Mon, 26 Feb 2024 11:46:13 +0000 (11:46 +0000)]
Merge branch 'rtnetlink-reduce-rtnl-pressure'

Eric Dumazet says:

====================
rtnetlink: reduce RTNL pressure for dumps

This series restarts the conversion of rtnl dump operations
to RCU protection, instead of requiring RTNL.

In this new attempt (prior one failed in 2011), I chose to
allow a gradual conversion of selected operations.

After this series, "ip -6 addr" and "ip -4 ro" no longer
need to acquire RTNL.

I refrained from changing inet_dump_ifaddr() and inet6_dump_addr()
to avoid merge conflicts because of two fixes in net tree.

I also started the work for "ip link" future conversion.

v2: rtnl_fill_link_ifmap() always emit IFLA_MAP (Jiri Pirko)
    Added "nexthop: allow nexthop_mpath_fill_node()
           to be called without RTNL" to avoid a lockdep splat (Ido Schimmel)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: provide RCU protection to rtnl_fill_prop_list()
Eric Dumazet [Thu, 22 Feb 2024 10:50:21 +0000 (10:50 +0000)]
rtnetlink: provide RCU protection to rtnl_fill_prop_list()

We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

dev->name_node items are already rcu protected.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: make rtnl_fill_link_ifmap() RCU ready
Eric Dumazet [Thu, 22 Feb 2024 10:50:20 +0000 (10:50 +0000)]
rtnetlink: make rtnl_fill_link_ifmap() RCU ready

Use READ_ONCE() to read the following device fields:

dev->mem_start
dev->mem_end
dev->base_addr
dev->irq
dev->dma
dev->if_port

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoinet: switch inet_dump_fib() to RCU protection
Eric Dumazet [Thu, 22 Feb 2024 10:50:19 +0000 (10:50 +0000)]
inet: switch inet_dump_fib() to RCU protection

No longer hold RTNL while calling inet_dump_fib().

Also change return value for a completed dump:

Returning 0 instead of skb->len allows NLMSG_DONE
to be appended to the skb. User space does not have
to call us again to get a standalone NLMSG_DONE marker.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonexthop: allow nexthop_mpath_fill_node() to be called without RTNL
Eric Dumazet [Thu, 22 Feb 2024 10:50:18 +0000 (10:50 +0000)]
nexthop: allow nexthop_mpath_fill_node() to be called without RTNL

nexthop_mpath_fill_node() will be potentially called
from contexts holding rcu_lock instead of RTNL.

Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoinet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU
Eric Dumazet [Thu, 22 Feb 2024 10:50:17 +0000 (10:50 +0000)]
inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU

Add a new field into struct fib_dump_filter, to let callers
tell if they use RTNL locking or RCU.

This is used in the following patch, when inet_dump_fib()
no longer holds RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: switch inet6_dump_ifinfo() to RCU protection
Eric Dumazet [Thu, 22 Feb 2024 10:50:16 +0000 (10:50 +0000)]
ipv6: switch inet6_dump_ifinfo() to RCU protection

No longer hold RTNL while calling inet6_dump_ifinfo()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
Eric Dumazet [Thu, 22 Feb 2024 10:50:15 +0000 (10:50 +0000)]
rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag

Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
allows dump operations registered via rtnl_register()
or rtnl_register_module() to opt-out from RTNL protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: change nlk->cb_mutex role
Eric Dumazet [Thu, 22 Feb 2024 10:50:14 +0000 (10:50 +0000)]
rtnetlink: change nlk->cb_mutex role

In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it"), Patrick McHardy used
a common mutex to protect both nlk->cb and the dump() operations.

The override is used for rtnl dumps, registered with
rntl_register() and rntl_register_module().

We want to be able to opt-out some dump() operations
to not acquire RTNL, so we need to protect nlk->cb
with a per socket mutex.

This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex

The optional pointer to the mutex used to protect dump()
call is stored in nlk->dump_cb_mutex

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonetlink: hold nlk->cb_mutex longer in __netlink_dump_start()
Eric Dumazet [Thu, 22 Feb 2024 10:50:13 +0000 (10:50 +0000)]
netlink: hold nlk->cb_mutex longer in __netlink_dump_start()

__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.

This seems dangerous, even if KASAN did not bother yet.

Add a @lock_taken parameter to netlink_dump() to let it
grab the mutex if called from netlink_recvmsg() only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonetlink: fix netlink_diag_dump() return value
Eric Dumazet [Thu, 22 Feb 2024 10:50:12 +0000 (10:50 +0000)]
netlink: fix netlink_diag_dump() return value

__netlink_diag_dump() returns 1 if the dump is not complete,
zero if no error occurred.

If err variable is zero, this means the dump is complete:
We should not return skb->len in this case, but 0.

This allows NLMSG_DONE to be appended to the skb.
User space does not have to call us again only to get NLMSG_DONE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: use xarray iterator to implement inet6_dump_ifinfo()
Eric Dumazet [Thu, 22 Feb 2024 10:50:11 +0000 (10:50 +0000)]
ipv6: use xarray iterator to implement inet6_dump_ifinfo()

Prepare inet6_dump_ifinfo() to run with RCU protection
instead of RTNL and use for_each_netdev_dump() interface.

Also properly return 0 at the end of a dump, avoiding
an extra recvmsg() system call and RTNL acquisition.

Note that RTNL-less dumps need core changes, coming later
in the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: prepare inet6_fill_ifinfo() for RCU protection
Eric Dumazet [Thu, 22 Feb 2024 10:50:10 +0000 (10:50 +0000)]
ipv6: prepare inet6_fill_ifinfo() for RCU protection

We want to use RCU protection instead of RTNL
for inet6_fill_ifinfo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoipv6: prepare inet6_fill_ifla6_attrs() for RCU
Eric Dumazet [Thu, 22 Feb 2024 10:50:09 +0000 (10:50 +0000)]
ipv6: prepare inet6_fill_ifla6_attrs() for RCU

We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs()
in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agortnetlink: prepare nla_put_iflink() to run under RCU
Eric Dumazet [Thu, 22 Feb 2024 10:50:08 +0000 (10:50 +0000)]
rtnetlink: prepare nla_put_iflink() to run under RCU

We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>