linux.git
14 months agonet: ti: icssg-prueth: Move common functions into a separate file
Diogo Ivo [Wed, 3 Apr 2024 10:48:13 +0000 (11:48 +0100)]
net: ti: icssg-prueth: Move common functions into a separate file

In order to allow code sharing between Silicon Revisions 1.0 and 2.0
move all functions that can be shared into a common file. This commit
introduces no functional changes.

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
14 months agoeth: Move IPv4/IPv6 multicast address bases to their own symbols
Diogo Ivo [Wed, 3 Apr 2024 10:48:12 +0000 (11:48 +0100)]
eth: Move IPv4/IPv6 multicast address bases to their own symbols

As these addresses can be useful outside of checking if an address
is a multicast address (for example in device drivers) make them
accessible to users of etherdevice.h to avoid code duplication.

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
14 months agodt-bindings: net: Add support for AM65x SR1.0 in ICSSG
Diogo Ivo [Wed, 3 Apr 2024 10:48:11 +0000 (11:48 +0100)]
dt-bindings: net: Add support for AM65x SR1.0 in ICSSG

Silicon Revision 1.0 of the AM65x came with a slightly different ICSSG
support: Only 2 PRUs per slice are available and instead 2 additional
DMA channels are used for management purposes. We have no restrictions
on specified PRUs, but the DMA channels need to be adjusted.

Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
14 months agonet: phy: air_en8811h: fix some error codes
Dan Carpenter [Fri, 5 Apr 2024 10:08:59 +0000 (13:08 +0300)]
net: phy: air_en8811h: fix some error codes

These error paths accidentally return "ret" which is zero/success
instead of the correct error code.

Fixes: 71e79430117d ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/7ef2e230-dfb7-4a77-8973-9e5be1a99fc2@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoarchnet: Convert from tasklet to BH workqueue
Allen Pais [Wed, 3 Apr 2024 16:23:06 +0000 (16:23 +0000)]
archnet: Convert from tasklet to BH workqueue

The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/net/archnet/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo <tj@kernel.org>
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Link: https://lore.kernel.org/r/20240403162306.20258-1-apais@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agor8169: add support for RTL8168M
Heiner Kallweit [Sun, 7 Apr 2024 21:19:25 +0000 (23:19 +0200)]
r8169: add support for RTL8168M

A user reported an unknown chip version. According to the r8168 vendor
driver it's called RTL8168M, but handling is identical to RTL8168H.
So let's simply treat it as RTL8168H.

Tested-by: Евгений <octobergun@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoMerge branch 'devlink-io-eqs'
David S. Miller [Mon, 8 Apr 2024 13:10:45 +0000 (14:10 +0100)]
Merge branch 'devlink-io-eqs'

Parav Pandit says:

====================
devlink: Add port function attribute for IO EQs

Currently, PCI SFs and VFs use IO event queues to deliver netdev per
channel events. The number of netdev channels is a function of IO
event queues. In the second scenario of an RDMA device, the
completion vectors are also a function of IO event queues. Currently, an
administrator on the hypervisor has no means to provision the number
of IO event queues for the SF device or the VF device. Device/firmware
determines some arbitrary value for these IO event queues. Due to this,
the SF netdev channels are unpredictable, and consequently, the
performance is too.

This short series introduces a new port function attribute: max_io_eqs.
The goal is to provide administrators at the hypervisor level with the
ability to provision the maximum number of IO event queues for a
function. This gives the control to the administrator to provision
right number of IO event queues and have predictable performance.

Examples of when an administrator provisions (set) maximum number of
IO event queues when using switchdev mode:

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10

  $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20

  $ devlink port show pci/0000:06:00.0/1
      pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
          function:
          hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20

This sets the corresponding maximum IO event queues of the function
before it is enumerated. Thus, when the VF/SF driver reads the
capability from the device, it sees the value provisioned by the
hypervisor. The driver is then able to configure the number of channels
for the net device, as well as the number of completion vectors
for the RDMA device. The device/firmware also honors the provisioned
value, hence any VF/SF driver attempting to create IO EQs
beyond provisioned value results in an error.

With above setting now, the administrator is able to achieve the 2x
performance on SFs with 20 channels. In second example when SF was
provisioned for a container with 2 cpus, the administrator provisioned only
2 IO event queues, thereby saving device resources.

With the above settings now in place, the administrator achieved 2x
performance with the SF device with 20 channels. In the second example,
when the SF was provisioned for a container with 2 CPUs, the administrator
provisioned only 2 IO event queues, thereby saving device resources.

changelog:
v2->v3:
- limited to 80 chars per line in devlink
- fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
  on error path
v1->v2:
- limited comment to 80 chars per line in header file
- fixed set function variables for reverse christmas tree
- fixed comments from Kalesh
- fixed missing kfree in get call
- returning error code for get cmd failure
- fixed error msg copy paste error in set on cmd failure
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agomlx5/core: Support max_io_eqs for a function
Parav Pandit [Sat, 6 Apr 2024 01:05:38 +0000 (04:05 +0300)]
mlx5/core: Support max_io_eqs for a function

Implement get and set for the maximum IO event queues for SF and VF.
This enables administrator on the hypervisor to control the maximum
IO event queues which are typically used to derive the maximum and
default number of net device channels or rdma device completion vectors.

Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agodevlink: Support setting max_io_eqs
Parav Pandit [Sat, 6 Apr 2024 01:05:37 +0000 (04:05 +0300)]
devlink: Support setting max_io_eqs

Many devices send event notifications for the IO queues,
such as tx and rx queues, through event queues.

Enable a privileged owner, such as a hypervisor PF, to set the number
of IO event queues for the VF and SF during the provisioning stage.

example:
Get maximum IO event queues of the VF device::

  $ devlink port show pci/0000:06:00.0/2
  pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
      function:
          hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10

Set maximum IO event queues of the VF device::

  $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32

  $ devlink port show pci/0000:06:00.0/2
  pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
      function:
          hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: display more skb fields in skb_dump()
Eric Dumazet [Sun, 7 Apr 2024 08:06:06 +0000 (08:06 +0000)]
net: display more skb fields in skb_dump()

Print these additional fields in skb_dump() to ease debugging.

- mac_len
- csum_start (in v2, at Willem suggestion)
- csum_offset (in v2, at Willem suggestion)
- priority
- mark
- alloc_cpu
- vlan_all
- encapsulation
- inner_protocol
- inner_mac_header
- inner_network_header
- inner_transport_header

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoMerge branch 'phy-cleanup-EEE'
David S. Miller [Mon, 8 Apr 2024 13:04:16 +0000 (14:04 +0100)]
Merge branch 'phy-cleanup-EEE'

Andrew Lunn says:

====================
net: Clean up some EEE code

Previous patches have reworked the API between phylib and MAC drivers
with respect to EEE, pushing most of the work into phylib. These two
patches rework two drivers to make use of the new API, and fix their
EEE implementation, so that EEE is configured in the MAC based on what
is actually negotiated during autoneg.

Compile tested only.
====================

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: lan743x: Fixup EEE
Andrew Lunn [Sat, 6 Apr 2024 20:16:00 +0000 (15:16 -0500)]
net: lan743x: Fixup EEE

The enabling/disabling of EEE in the MAC should happen as a result of
auto negotiation. So move the enable/disable into
lan743x_phy_link_status_change() which gets called by phylib when
there is a change in link status.

lan743x_ethtool_set_eee() now just programs the hardware with the LTI
timer value, and passed everything else to phylib, so it can correctly
setup the PHY.

lan743x_ethtool_get_eee() relies on phylib doing most of the work, the
MAC driver just adds the LTI timer value.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: usb: lan78xx: Fixup EEE
Andrew Lunn [Sat, 6 Apr 2024 20:15:59 +0000 (15:15 -0500)]
net: usb: lan78xx: Fixup EEE

The enabling/disabling of EEE in the MAC should happen as a result of
auto negotiation. So move the enable/disable into
lan783xx_phy_link_status_change() which gets called by phylib when
there is a change in link status.

lan78xx_set_eee() now just programs the hardware with the LPI
timer value, and passed everything else to phylib, so it can correctly
setup the PHY.

lan743x_get_eee() relies on phylib doing most of the work, the
MAC driver just adds the LPI timer value.

Call phy_support_eee() to indicate the MAC does actually support EEE.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agomptcp: add reset reason options in some places
Jason Xing [Sat, 6 Apr 2024 01:48:48 +0000 (09:48 +0800)]
mptcp: add reset reason options in some places

The reason codes are handled in two ways nowadays (quoting Mat Martineau):
1. Sending in the MPTCP option on RST packets when there is no subflow
context available (these use subflow_add_reset_reason() and directly call
a TCP-level send_reset function)
2. The "normal" way via subflow->reset_reason. This will propagate to both
the outgoing reset packet and to a local path manager process via netlink
in mptcp_event_sub_closed()

RFC 8684 defines the skb reset reason behaviour which is not required
even though in some places:

    A host sends a TCP RST in order to close a subflow or reject
    an attempt to open a subflow (MP_JOIN). In order to let the
    receiving host know why a subflow is being closed or rejected,
    the TCP RST packet MAY include the MP_TCPRST option (Figure 15).
    The host MAY use this information to decide, for example, whether
    it tries to re-establish the subflow immediately, later, or never.

Since the commit dc87efdb1a5cd ("mptcp: add mptcp reset option support")
introduced this feature about three years ago, we can fully use it.
There remains some places where we could insert reason into skb as
we can see in this patch.

Many thanks to Mat and Paolo for help:)

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoipv4: Set scope explicitly in ip_route_output().
Guillaume Nault [Fri, 5 Apr 2024 20:05:00 +0000 (22:05 +0200)]
ipv4: Set scope explicitly in ip_route_output().

Add a "scope" parameter to ip_route_output() so that callers don't have
to override the tos parameter with the RTO_ONLINK flag if they want a
local scope.

This will allow converting flowi4_tos to dscp_t in the future, thus
allowing static analysers to flag invalid interactions between
"tos" (the DSCP bits) and ECN.

Only three users ask for local scope (bonding, arp and atm). The others
continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn
users about the limitations of ip_route_output().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoipvlan: handle NETDEV_DOWN event
Venkat Venkatsubra [Fri, 5 Apr 2024 18:16:12 +0000 (11:16 -0700)]
ipvlan: handle NETDEV_DOWN event

In case of stacked devices, to help propagate the down
link state from the parent/root device (to this leaf device),
handle NETDEV_DOWN event like it is done now for NETDEV_UP.

In the below example, ens5 is the host interface which is the
parent of the ipvlan interface eth0 in the container.

Host:

[root@gkn-podman-x64 ~]# ip link set ens5 down
[root@gkn-podman-x64 ~]# ip -d link show dev ens5
3: ens5: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN
      ...
[root@gkn-podman-x64 ~]#

Container:

[root@testnode-ol8 /]# ip -d link show dev eth0
2: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 state UNKNOWN
        ...
    ipvlan mode l2 bridge
        ...
[root@testnode-ol8 /]#

eth0's state continues to show up as UP even though ens5 is now DOWN.

For macvlan the handling of NETDEV_DOWN event was added in
commit 80fd2d6ca546 ("macvlan: Change status when lower device goes down").

Reported-by: Gia-Khanh Nguyen <gia-khanh.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoaf_packet: avoid a false positive warning in packet_setsockopt()
Eric Dumazet [Fri, 5 Apr 2024 11:49:39 +0000 (11:49 +0000)]
af_packet: avoid a false positive warning in packet_setsockopt()

Although the code is correct, the following line

copy_from_sockptr(&req_u.req, optval, len));

triggers this warning :

memcpy: detected field-spanning write (size 28) of single field "dst" at include/linux/sockptr.h:49 (size 16)

Refactor the code to be more explicit.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: handle HAS_IOPORT dependencies
Niklas Schnelle [Fri, 5 Apr 2024 11:18:31 +0000 (13:18 +0200)]
net: handle HAS_IOPORT dependencies

In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at
compile time. We thus need to add HAS_IOPORT as dependency for
those drivers requiring them. For the DEFXX driver the use of I/O
ports is optional and we only need to fence specific code paths. It also
turns out that with HAS_IOPORT handled explicitly HAMRADIO does not need
the !S390 dependency and successfully builds the bpqether driver.

Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maciej W. Rozycki <macro@orcam.me.uk>
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoMerge branch 'mptcp-selftests'
David S. Miller [Mon, 8 Apr 2024 10:53:22 +0000 (11:53 +0100)]
Merge branch 'mptcp-selftests'

Matthieu Baerts says:

====================
selftests: mptcp: cleanups and 'ip mptcp' support

Here are some patches from Geliang, doing different cleanups, and
supporting 'ip mptcp' in more MPTCP selftests.

Patch 1 checks that TC is available in selftests requiring it.

Patch 2 adds 'ms' units in TC commands, to avoid confusions.

Patches 3-9 are some prerequisites for patch 10: some export code from
mptcp_join.sh to mptcp_lib.sh, to be re-used in pm_netlink.sh,
mptcp_sockopt.sh and simult_flows.sh ; and others add helpers to
pm_netlink.sh to easily support both 'ip mptcp' and 'pm_nl_ctl' tools to
interact with the in-kernel MPTCP path-manager.

Patch 10 adds a '-i' parameter in mptcp_sockopt.sh, pm_netlink.sh, and
simult_flows.sh to use 'ip mptcp' tool instead of 'pm_nl_ctl'.

Patch 11 fixes some ShellCheck warnings in pm_netlink.sh, in order to
drop a ShellCheck's 'disable' instruction.
====================

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: netlink: drop disable=SC2086
Geliang Tang [Fri, 5 Apr 2024 10:52:15 +0000 (12:52 +0200)]
selftests: mptcp: netlink: drop disable=SC2086

Now there are only a few of variables are not using double quotes.
Modifying them, then "shellcheck disable=SC2086" can be dropped.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: ip_mptcp option for more scripts
Geliang Tang [Fri, 5 Apr 2024 10:52:14 +0000 (12:52 +0200)]
selftests: mptcp: ip_mptcp option for more scripts

This patch adds '-i' option for mptcp_sockopt.sh, pm_netlink.sh, and
simult_flows.sh, to use 'ip mptcp' command in the tests instead of
'pm_nl_ctl'. Update usage() correspondingly.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: use pm_nl endpoint ops
Geliang Tang [Fri, 5 Apr 2024 10:52:13 +0000 (12:52 +0200)]
selftests: mptcp: use pm_nl endpoint ops

Use those newly added pm_nl endpoint ops helpers to replace all 'pm_nl_ctl'
commands with 'limits', 'add', 'del', 'flush', 'show' and 'set' arguments
in scripts mptcp_sockopt.sh and simult_flows.sh.

In pm_netlink.sh, add wrappers of there helpers to make the function names
shorter. Then use the wrappers to replace all 'pm_nl_ctl' commands.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: export pm_nl endpoint ops
Geliang Tang [Fri, 5 Apr 2024 10:52:12 +0000 (12:52 +0200)]
selftests: mptcp: export pm_nl endpoint ops

This patch exports six endpoint operation helpers with pm_nl_ prefix,
pm_nl_set_limits(), pm_nl_add_endpoint(), pm_nl_del_endpoint(),
pm_nl_flush_endpoint(), pm_nl_show_endpoints() and pm_nl_change_endpoint()
into mptcp_lib.sh as public functions, and renamed each of them with a
mptcp_lib_ prefix. Then these old pm_nl_ prefix helpers in mptcp_join.sh
can be wrappers of mptcp_lib_ prefix ones.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: join: update endpoint ops
Geliang Tang [Fri, 5 Apr 2024 10:52:11 +0000 (12:52 +0200)]
selftests: mptcp: join: update endpoint ops

This patch uses 'case' statements to simplify pm_nl_add_endpoint() and
pm_nl_check_endpoint(). And simplify pm_nl_check_endpoint() with
check_output() helper. Also update pm_nl_del_endpoint() to avoid the
'double quote' shellcheck warning.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: netlink: add change_address helper
Geliang Tang [Fri, 5 Apr 2024 10:52:10 +0000 (12:52 +0200)]
selftests: mptcp: netlink: add change_address helper

The output formats of 'ip mptcp' commands are much different from that
of 'pm_nl_ctl' commands.

A new 'change_address' helper is added here, to change the flag of an
address. This is a bit similar to mptcp_join.sh's pm_nl_change_endpoint().

Usage:
Address ID - pm_nl_change_endpoint $ns id $id $flags
IP address - change_address $ns $addr $flags

Use this new helper in pm_netlink.sh to replace all 'pm_nl_ctl set'
commands.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: add {get,format}_endpoint(s) helpers
Geliang Tang [Fri, 5 Apr 2024 10:52:09 +0000 (12:52 +0200)]
selftests: mptcp: add {get,format}_endpoint(s) helpers

The output formats of 'ip mptcp' commands are much different from that
of 'pm_nl_ctl' commands.

This patch adds a new helper format_endpoints() to format the outputs of
'ip mptcp' and 'pm_nl_ctl' with 'endpoints' arguments to hide these
differences.

A new helper named get_endpoint() has also been added to show a specific
endpoint identified by the given address ID, similar to mptcp_join.sh's
pm_nl_show_endpoints() helper, but showing all entries.

Use these two helpers in mptcp_join.sh and pm_netlink.sh to replace all
'pm_nl_ctl get' commands and outputs of 'pm_nl_ctl dump/get'.

Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: netlink: add 'limits' helpers
Geliang Tang [Fri, 5 Apr 2024 10:52:08 +0000 (12:52 +0200)]
selftests: mptcp: netlink: add 'limits' helpers

The output format of 'ip mptcp limits' command is much different from
that of 'pm_nl_ctl limits' command.

This patch adds format_limits() helper to format the outputs of these
two commands to hide the difference. get_limits() has been added to show
the limits.

Use these two helpers in pm_netlink.sh to replace all 'pm_nl_ctl limits'
commands and outputs.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: export ip_mptcp to mptcp_lib
Geliang Tang [Fri, 5 Apr 2024 10:52:07 +0000 (12:52 +0200)]
selftests: mptcp: export ip_mptcp to mptcp_lib

This patch exports ip_mptcp into mptcp_lib.sh as a public variable,
named MPTCP_LIB_IP_MPTCP. Add a helper mptcp_lib_set_ip_mptcp() to set
it, and a helper mptcp_lib_is_ip_mptcp() to test whether it is set. Use
these two helpers in mptcp_join.sh.

This patch is prepared for coming commits.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: add ms units for tc-netem delay
Geliang Tang [Fri, 5 Apr 2024 10:52:06 +0000 (12:52 +0200)]
selftests: mptcp: add ms units for tc-netem delay

'delay 1' in tc-netem is confusing, not sure if it's a delay of 1 second or
1 millisecond. This patch explicitly adds millisecond units to make these
commands clearer.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: mptcp: add tc check for check_tools
Geliang Tang [Fri, 5 Apr 2024 10:52:05 +0000 (12:52 +0200)]
selftests: mptcp: add tc check for check_tools

tc are used in some test scripts: mptcp_connect.sh, mptcp_join.sh and
simult_flows.sh. It makes sense to check if tc is installed before running
these scripts, just like other tools. So this patch add 'tc' check for
mptcp_lib_check_tools(), and check it in these test scripts.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agotcp: more struct tcp_sock adjustments
Eric Dumazet [Fri, 5 Apr 2024 10:29:26 +0000 (10:29 +0000)]
tcp: more struct tcp_sock adjustments

tp->recvmsg_inq is used from tcp recvmsg() thus should
be in tcp_sock_read_rx group.

tp->tcp_clock_cache and tp->tcp_mstamp are written
both in rx and tx paths, thus are better placed
in tcp_sock_write_txrx group.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: usb: ax88179_178a: non necessary second random mac address
Jose Ignacio Tornos Martinez [Fri, 5 Apr 2024 08:24:31 +0000 (10:24 +0200)]
net: usb: ax88179_178a: non necessary second random mac address

If the mac address can not be read from the device registers or the
devicetree, a random address is generated, but this was already done from
usbnet_probe, so it is not necessary to call eth_hw_addr_random from here
again to generate another random address.

Indeed, when reset was also executed from bind, generate another random mac
address invalidated the check from usbnet_probe to configure if the assigned
mac address for the interface was random or not, because it is comparing
with the initial generated random address. Now, with only a reset from open
operation, it is just a harmless simplification.

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agopfcp: avoid copy warning by simplifing code
Michal Swiatkowski [Fri, 5 Apr 2024 06:36:05 +0000 (08:36 +0200)]
pfcp: avoid copy warning by simplifing code

From Arnd comments:
"The memcpy() in the ip_tunnel_info_opts_set() causes
a string.h fortification warning, with at least gcc-13:

    In function 'fortify_memcpy_chk',
        inlined from 'ip_tunnel_info_opts_set' at include/net/ip_tunnels.h:619:3,
        inlined from 'pfcp_encap_recv' at drivers/net/pfcp.c:84:2:
    include/linux/fortify-string.h:553:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
      553 |                         __write_overflow_field(p_size_field, size);"

It is a false-positivie caused by ambiguity of the union.

However, as Arnd noticed, copying here is unescessary. The code can be
simplified to avoid calling ip_tunnel_info_opts_set(), which is doing
copying, setting flags and options_len.

Set correct flags and options_len directly on tun_info.

Fixes: 6dd514f48110 ("pfcp: always set pfcp metadata")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/netdev/701f8f93-f5fb-408b-822a-37a1d5c424ba@app.fastmail.com/
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoMerge branch 'ynl-tests'
David S. Miller [Mon, 8 Apr 2024 10:40:41 +0000 (11:40 +0100)]
Merge branch 'ynl-tests'

Jakub Kicinski says:

====================
selftests: net: groundwork for YNL-based tests

Currently the options for writing networking tests are C, bash or
some mix of the two. YAML/Netlink gives us the ability to easily
interface with Netlink in higher level laguages. In particular,
there is a Python library already available in tree, under tools/net.
Add the scaffolding which allows writing tests using this library.

The "scaffolding" is needed because the library lives under
tools/net and uses YAML files from under Documentation/.
So we need a small amount of glue code to find those things
and add them to TEST_FILES.

This series adds both a basic SW sanity test and driver
test which can be run against netdevsim or a real device.
When I develop core code I usually test with netdevsim,
then a real device, and then a backport to Meta's kernel.
Because of the lack of integration, until now I had
to throw away the (YNL-based) test script and netdevsim code.

Running tests in tree directly:

 $ ./tools/testing/selftests/net/nl_netdev.py
 KTAP version 1
 1..2
 ok 1 nl_netdev.empty_check
 ok 2 nl_netdev.lo_check
 # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

in tree via make:

 $ make -C tools/testing/selftests/ TARGETS=net \
TEST_PROGS=nl_netdev.py TEST_GEN_PROGS="" run_tests
  [ ... ]

and installed externally, all seem to work:

 $ make -C tools/testing/selftests/ TARGETS=net \
install INSTALL_PATH=/tmp/ksft-net
 $ /tmp/ksft-net/run_kselftest.sh -t net:nl_netdev.py
  [ ... ]

For driver tests I followed the lead of net/forwarding and
get the device name from env and/or a config file.

v3:
 - fix up netdevsim C
 - various small nits in other patches (see changelog in patches)
v2: https://lore.kernel.org/all/20240403023426.1762996-1-kuba@kernel.org/
 - don't add to TARGETS, create a deperate variable with deps
 - support and use with
 - support and use passing arguments to tests
v1: https://lore.kernel.org/all/20240402010520.1209517-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agotesting: net-drv: add a driver test for stats reporting
Jakub Kicinski [Fri, 5 Apr 2024 02:45:26 +0000 (19:45 -0700)]
testing: net-drv: add a driver test for stats reporting

Add a very simple test to make sure drivers report expected
stats. Drivers which implement FEC or pause configuration
should report relevant stats. Qstats must be reported,
at least packet and byte counts, and they must match
total device stats.

Tested with netdevsim, bnxt, in-tree and installed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: drivers: add scaffolding for Netlink tests in Python
Jakub Kicinski [Fri, 5 Apr 2024 02:45:25 +0000 (19:45 -0700)]
selftests: drivers: add scaffolding for Netlink tests in Python

Add drivers/net as a target for mixed-use tests.
The setup is expected to work similarly to the forwarding tests.
Since we only need one interface (unlike forwarding tests)
read the target device name from NETIF. If not present we'll
try to run the test against netdevsim.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonetdevsim: report stats by default, like a real device
Jakub Kicinski [Fri, 5 Apr 2024 02:45:24 +0000 (19:45 -0700)]
netdevsim: report stats by default, like a real device

Real devices should implement qstats. Devices which support
pause or FEC configuration should also report the relevant stats.

nsim was missing FEC stats completely, some of the qstats
and pause stats required toggling a debugfs knob.

Note that the tests which used pause always initialize the setting
so they shouldn't be affected by the different starting value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: nl_netdev: add a trivial Netlink netdev test
Jakub Kicinski [Fri, 5 Apr 2024 02:45:23 +0000 (19:45 -0700)]
selftests: nl_netdev: add a trivial Netlink netdev test

Add a trivial test using YNL.

  $ ./tools/testing/selftests/net/nl_netdev.py
  KTAP version 1
  1..2
  ok 1 nl_netdev.empty_check
  ok 2 nl_netdev.lo_check

Instantiate the family once, it takes longer than the test itself.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoselftests: net: add scaffolding for Netlink tests in Python
Jakub Kicinski [Fri, 5 Apr 2024 02:45:22 +0000 (19:45 -0700)]
selftests: net: add scaffolding for Netlink tests in Python

Add glue code for accessing the YNL library which lives under
tools/net and YAML spec files from under Documentation/.
Automatically figure out if tests are run in tree or not.
Since we'll want to use this library both from net and
drivers/net test targets make the library a target as well,
and automatically include it when net or drivers/net are
included. Making net/lib a target ensures that we end up
with only one copy of it, and saves us some path guessing.

Add a tiny bit of formatting support to be able to output KTAP
from the start.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoMerge tag 'batadv-next-pullrequest-20240405' of git://git.open-mesh.org/linux-merge
David S. Miller [Mon, 8 Apr 2024 10:37:27 +0000 (11:37 +0100)]
Merge tag 'batadv-next-pullrequest-20240405' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - prefer kfree_rcu() over call_rcu() with free-only callbacks,
   by Dmitry Antipov

 - bypass empty buckets in batadv_purge_orig_ref(), by Eric Dumazet
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: mdio-gpio: Use device_is_compatible()
Andy Shevchenko [Thu, 4 Apr 2024 17:55:57 +0000 (20:55 +0300)]
net: mdio-gpio: Use device_is_compatible()

Replace open coded variant of device_is_compatible().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: dqs: use sysfs_emit() in favor of sprintf()
Eric Dumazet [Thu, 4 Apr 2024 16:46:04 +0000 (16:46 +0000)]
net: dqs: use sysfs_emit() in favor of sprintf()

Commit 6025b9135f7a ("net: dqs: add NIC stall detector based on BQL")
added three sysfs files.

Use the recommended sysfs_emit() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoip_tunnel: harden copying IP tunnel params to userspace
Alexander Lobakin [Thu, 4 Apr 2024 16:03:02 +0000 (18:03 +0200)]
ip_tunnel: harden copying IP tunnel params to userspace

Structures which are about to be copied to userspace shouldn't have
uninitialized fields or paddings.
memset() the whole &ip_tunnel_parm in ip_tunnel_parm_to_user() before
filling it with the kernel data. The compilers will hopefully combine
writes to it.

Fixes: 117aef12a7b1 ("ip_tunnel: use a separate struct to store tunnel params in the kernel")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/netdev/5f63dd25-de94-4ca3-84e6-14095953db13@moroto.mountain
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoipv6: remove RTNL protection from ip6addrlbl_dump()
Eric Dumazet [Thu, 4 Apr 2024 13:24:13 +0000 (13:24 +0000)]
ipv6: remove RTNL protection from ip6addrlbl_dump()

No longer hold RTNL while calling ip6addrlbl_dump()
("ip addrlabel show")

ip6addrlbl_dump() was already mostly relying on RCU anyway.

Add READ_ONCE()/WRITE_ONCE() annotations around
net->ipv6.ip6addrlbl_table.seq

Note that ifal_seq value is currently ignored in iproute2,
and a bit weak.

We might user later cb->seq  / nl_dump_check_consistent()
protocol if needed.

Also change return value for a completed dump,
so that NLMSG_DONE can be appended to current skb,
saving one recvmsg() system call.

v2: read net->ipv6.ip6addrlbl_table.seq once, (David Ahern)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link:https://lore.kernel.org/netdev/67f5cb70-14a4-4455-8372-f039da2f15c2@kernel.org/
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoinet: frags: delay fqdir_free_fn()
Eric Dumazet [Thu, 4 Apr 2024 13:07:51 +0000 (13:07 +0000)]
inet: frags: delay fqdir_free_fn()

fqdir_free_fn() is using very expensive rcu_barrier()

When one netns is dismantled, we often call fqdir_exit()
multiple times, typically lauching fqdir_free_fn() twice.

Delaying by one second fqdir_free_fn() helps to reduce
the number of rcu_barrier() calls, and lock contention
on rcu_state.barrier_mutex.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoip6_vti: Remove generic .ndo_get_stats64
Breno Leitao [Thu, 4 Apr 2024 12:52:52 +0000 (05:52 -0700)]
ip6_vti: Remove generic .ndo_get_stats64

Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.

Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoip6_vti: Do not use custom stat allocator
Breno Leitao [Thu, 4 Apr 2024 12:52:51 +0000 (05:52 -0700)]
ip6_vti: Do not use custom stat allocator

With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the ip6_vti and leverage the network
core allocation instead.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoMerge branch 'phy-listing-link_topology-tracking'
David S. Miller [Sat, 6 Apr 2024 17:25:15 +0000 (18:25 +0100)]
Merge branch 'phy-listing-link_topology-tracking'

Maxime Chevallier says:

====================
Introduce PHY listing and link_topology tracking

This is V11 for the link topology addition, allowing to track all PHYs
that are linked to netdevices.

This V11 addresses the various netlink-related issues that were raised
by Jakub, and fixes some typos in the documentation.

As a remainder, here's what the PHY listings would look like :
 - eth0 has a 88x3310 acting as media converter, and an SFP module with
   an embedded 88e1111 PHY
 - eth2 has a 88e1510 PHY

PHY for eth0:
PHY index: 1
Driver name: mv88x3310
PHY device name: f212a600.mdio-mii:00
Downstream SFP bus name: sfp-eth0
PHY id: 0
Upstream type: MAC

PHY for eth0:
PHY index: 2
Driver name: Marvell 88E1111
PHY device name: i2c:sfp-eth0:16
PHY id: 21040322
Upstream type: PHY
Upstream PHY index: 1
Upstream SFP name: sfp-eth0

PHY for eth2:
PHY index: 1
Driver name: Marvell 88E1510
PHY device name: f212a200.mdio-mii:00
PHY id: 21040593
Upstream type: MAC

Ethtool patches : https://github.com/minimaxwell/ethtool/tree/link-topo-v6
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: ethtool: Allow passing a phy index for some commands
Maxime Chevallier [Thu, 4 Apr 2024 09:29:55 +0000 (11:29 +0200)]
net: ethtool: Allow passing a phy index for some commands

Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the relevant phydev when the command targets a PHY.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: sfp: Add helper to return the SFP bus name
Maxime Chevallier [Thu, 4 Apr 2024 09:29:54 +0000 (11:29 +0200)]
net: sfp: Add helper to return the SFP bus name

Knowing the bus name is helpful when we want to expose the link topology
to userspace, add a helper to return the SFP bus name.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: phy: add helpers to handle sfp phy connect/disconnect
Maxime Chevallier [Thu, 4 Apr 2024 09:29:53 +0000 (11:29 +0200)]
net: phy: add helpers to handle sfp phy connect/disconnect

There are a few PHY drivers that can handle SFP modules through their
sfp_upstream_ops. Introduce Phylib helpers to keep track of connected
SFP PHYs in a netdevice's namespace, by adding the SFP PHY to the
upstream PHY's netdev's namespace.

By doing so, these SFP PHYs can be enumerated and exposed to users,
which will be able to use their capabilities.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: sfp: pass the phy_device when disconnecting an sfp module's PHY
Maxime Chevallier [Thu, 4 Apr 2024 09:29:52 +0000 (11:29 +0200)]
net: sfp: pass the phy_device when disconnecting an sfp module's PHY

Pass the phy_device as a parameter to the sfp upstream .disconnect_phy
operation. This is preparatory work to help track phy devices across
a net_device's link.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: phy: Introduce ethernet link topology representation
Maxime Chevallier [Thu, 4 Apr 2024 09:29:51 +0000 (11:29 +0200)]
net: phy: Introduce ethernet link topology representation

Link topologies containing multiple network PHYs attached to the same
net_device can be found when using a PHY as a media converter for use
with an SFP connector, on which an SFP transceiver containing a PHY can
be used.

With the current model, the transceiver's PHY can't be used for
operations such as cable testing, timestamping, macsec offload, etc.

The reason being that most of the logic for these configuration, coming
from either ethtool netlink or ioctls tend to use netdev->phydev, which
in multi-phy systems will reference the PHY closest to the MAC.

Introduce a numbering scheme allowing to enumerate PHY devices that
belong to any netdev, which can in turn allow userspace to take more
precise decisions with regard to each PHY's configuration.

The numbering is maintained per-netdev, in a phy_device_list.
The numbering works similarly to a netdevice's ifindex, with
identifiers that are only recycled once INT_MAX has been reached.

This prevents races that could occur between PHY listing and SFP
transceiver removal/insertion.

The identifiers are assigned at phy_attach time, as the numbering
depends on the netdevice the phy is attached to. The PHY index can be
re-used for PHYs that are persistent.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: phy: marvell: implement cable test for 88E1111
Pawel Dembicki [Thu, 4 Apr 2024 09:07:26 +0000 (11:07 +0200)]
net: phy: marvell: implement cable test for 88E1111

The same implementation is also valid for 88E1145. VCT in 88E1111 is
similar to the 88E609x family. The main difference lies in register
organization and required workarounds.

It utilizes the same fields in registers but requires a simpler
implementation.

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonetlink: add nlmsg_consume() and use it in devlink compat
Jakub Kicinski [Wed, 3 Apr 2024 20:22:59 +0000 (13:22 -0700)]
netlink: add nlmsg_consume() and use it in devlink compat

devlink_compat_running_version() sticks out when running
netdevsim tests and watching dropped skbs. Add nlmsg_consume()
for cases were we want to free a netlink skb but it is expected,
rather than a drop. af_netlink code uses consume_skb() directly,
which is fine, but some may prefer the symmetry of nlmsg_new() /
nlmsg_consume().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agonet: skbuff: generalize the skb->decrypted bit
Jakub Kicinski [Wed, 3 Apr 2024 20:21:39 +0000 (13:21 -0700)]
net: skbuff: generalize the skb->decrypted bit

The ->decrypted bit can be reused for other crypto protocols.
Remove the direct dependency on TLS, add helpers to clean up
the ifdefs leaking out everywhere.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 months agoMerge branch 'ynl-rename-array-nest-to-indexed-array'
Jakub Kicinski [Sat, 6 Apr 2024 05:32:51 +0000 (22:32 -0700)]
Merge branch 'ynl-rename-array-nest-to-indexed-array'

Hangbin Liu says:

====================
ynl: rename array-nest to indexed-array

rename array-nest to indexed-array and add un-nest sub-type support
====================

Link: https://lore.kernel.org/r/20240404063114.1221532-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoynl: support binary and integer sub-type for indexed-array
Hangbin Liu [Thu, 4 Apr 2024 06:31:13 +0000 (14:31 +0800)]
ynl: support binary and integer sub-type for indexed-array

Add binary and integer sub-type support for indexed-array to display bond
arp and ns targets. Here is what the result looks like:

 # ip link add bond0 type bond mode 1 \
   arp_ip_target 192.168.1.1,192.168.1.2 ns_ip6_target 2001::1,2001::2
 # ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \
   --do getlink --json '{"ifname": "bond0"}' --output-json | jq '.linkinfo'

    "arp-ip-target": [
      "192.168.1.1",
      "192.168.1.2"
    ],
    [...]
    "ns-ip6-target": [
      "2001::1",
      "2001::2"
    ],

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240404063114.1221532-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoynl: rename array-nest to indexed-array
Hangbin Liu [Thu, 4 Apr 2024 06:31:12 +0000 (14:31 +0800)]
ynl: rename array-nest to indexed-array

Some implementations, like bonding, has nest array with same attr type.
To support all kinds of entries under one nest array. As discussed[1],
let's rename array-nest to indexed-array, and assuming the value is
a nest by passing the type via sub-type.

[1] https://lore.kernel.org/netdev/20240312100105.16a59086@kernel.org/

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240404063114.1221532-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agotcp: annotate data-races around tp->window_clamp
Eric Dumazet [Thu, 4 Apr 2024 11:42:31 +0000 (11:42 +0000)]
tcp: annotate data-races around tp->window_clamp

tp->window_clamp can be read locklessly, add READ_ONCE()
and WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240404114231.2195171-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge branch 'ethtool-hw-timestamping-statistics'
Jakub Kicinski [Sat, 6 Apr 2024 05:22:49 +0000 (22:22 -0700)]
Merge branch 'ethtool-hw-timestamping-statistics'

Rahul Rameshbabu says:

====================
ethtool HW timestamping statistics

The goal of this patch series is to introduce a common set of ethtool
statistics for hardware timestamping that a driver implementer can hook into.
The statistics counters added are based on what I believe are common
patterns/behaviors found across various hardware timestamping implementations
seen in the kernel tree today. The mlx5 family of devices is used
as the PoC for this patch series. Other vendors are more than welcome
to chime in on this series.

Link: https://lore.kernel.org/netdev/20240402205223.137565-1-rrameshbabu@nvidia.com/
Link: https://lore.kernel.org/netdev/20240309084440.299358-1-rrameshbabu@nvidia.com/
Link: https://lore.kernel.org/netdev/20240223192658.45893-1-rrameshbabu@nvidia.com/
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
====================

Link: https://lore.kernel.org/r/20240403212931.128541-1-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agotools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation
Rahul Rameshbabu [Wed, 3 Apr 2024 21:28:45 +0000 (14:28 -0700)]
tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation

Print the nested stats attribute containing timestamping statistics when
the --show-time-stamping flag is used.

  [root@binary-eater-vm-01 linux-ethtool-ts]# ./tools/net/ynl/ethtool.py --show-time-stamping mlx5_1
  Time stamping parameters for mlx5_1:
  Capabilities:
    hardware-transmit
    hardware-receive
    hardware-raw-clock
  PTP Hardware Clock: 0
  Hardware Transmit Timestamp Modes:
    off
    on
  Hardware Receive Filter Modes:
    none
    all
  Statistics:
    tx-pkts: 8
    tx-lost: 0
    tx-err: 0

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-8-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonetlink: specs: ethtool: define header-flags as an enum
Jakub Kicinski [Sat, 6 Apr 2024 05:22:38 +0000 (22:22 -0700)]
netlink: specs: ethtool: define header-flags as an enum

Recent changes added header flags to the spec.
Use an enum instead of defines for more seamless codegen.

[Jakub: drop the already applied parts and rewrite message]

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-6-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet/mlx5e: Implement ethtool hardware timestamping statistics
Rahul Rameshbabu [Wed, 3 Apr 2024 21:28:42 +0000 (14:28 -0700)]
net/mlx5e: Implement ethtool hardware timestamping statistics

Feed driver statistics counters related to hardware timestamping to
standardized ethtool hardware timestamping statistics group.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-5-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet/mlx5e: Introduce timestamps statistic counter for Tx DMA layer
Rahul Rameshbabu [Wed, 3 Apr 2024 21:28:41 +0000 (14:28 -0700)]
net/mlx5e: Introduce timestamps statistic counter for Tx DMA layer

Count number of transmitted packets that were hardware timestamped at the
device DMA layer.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-4-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQ
Rahul Rameshbabu [Wed, 3 Apr 2024 21:28:40 +0000 (14:28 -0700)]
net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQ

Track the number of times a CQE was expected to not be delivered on PTP Tx
port timestamping CQ. A CQE is expected to not be delivered if a certain
amount of time passes since the corresponding CQE containing the DMA
timestamp information has arrived. Increment the late_cqe counter when such
a CQE does manage to be delivered to the CQ.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-3-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoethtool: add interface to read Tx hardware timestamping statistics
Rahul Rameshbabu [Wed, 3 Apr 2024 21:28:39 +0000 (14:28 -0700)]
ethtool: add interface to read Tx hardware timestamping statistics

Multiple network devices that support hardware timestamping appear to have
common behavior with regards to timestamp handling. Implement common Tx
hardware timestamping statistics in a tx_stats struct_group. Common Rx
hardware timestamping statistics can subsequently be implemented in a
rx_stats struct_group for ethtool_ts_stats.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-2-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge branch 'address-all-wunused-const-warnings'
Jakub Kicinski [Sat, 6 Apr 2024 05:11:28 +0000 (22:11 -0700)]
Merge branch 'address-all-wunused-const-warnings'

Arnd Bergmann says:

====================
address all -Wunused-const warnings (the net part)

Compilers traditionally warn for unused 'static' variables, but not
if they are constant. The reason here is a custom for C++ programmers
to define named constants as 'static const' variables in header files
instead of using macros or enums.

In W=1 builds, we get warnings only static const variables in C
files, but not in headers, which is a good compromise, but this still
produces warning output in at least 30 files. These warnings are
almost all harmless, but also trivial to fix, and there is no
good reason to warn only about the non-const variables being unused.

I've gone through all the files that I found using randconfig and
allmodconfig builds and created patches to avoid these warnings,
with the goal of retaining a clean build once the option is enabled
by default.

Unfortunately, there is one fairly large patch ("drivers: remove
incorrect of_match_ptr/ACPI_PTR annotations") that touches
34 individual drivers that all need the same one-line change.
If necessary, I can split it up by driver or by subsystem,
but at least for reviewing I would keep it as one piece for
the moment.
====================

Link: https://lore.kernel.org/r/20240403080702.3509288-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: xgbe: remove extraneous #ifdef checks
Arnd Bergmann [Wed, 3 Apr 2024 08:06:46 +0000 (10:06 +0200)]
net: xgbe: remove extraneous #ifdef checks

When both ACPI and OF are disabled, xgbe_v1 is unused and
causes a W=1 warning:

drivers/net/ethernet/amd/xgbe/xgbe-platform.c:533:39: error: unused variable 'xgbe_v1' [-Werror,-Wunused-const-variable]
static const struct xgbe_version_data xgbe_v1 = {

There is no real point in trying to save a few bytes for the match
tables, so just make them always visible.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240403080702.3509288-29-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoisdn: kcapi: don't build unused procfs code
Arnd Bergmann [Wed, 3 Apr 2024 08:06:44 +0000 (10:06 +0200)]
isdn: kcapi: don't build unused procfs code

The procfs file is completely unused without CONFIG_PROC_FS but causes
a compile time warning:

drivers/isdn/capi/kcapi_proc.c:97:36: error: unused variable 'seq_controller_ops' [-Werror,-Wunused-const-variable]
static const struct seq_operations seq_controller_ops = {
drivers/isdn/capi/kcapi_proc.c:104:36: error: unused variable 'seq_contrstats_ops' [-Werror,-Wunused-const-variable]
drivers/isdn/capi/kcapi_proc.c:179:36: error: unused variable 'seq_applications_ops' [-Werror,-Wunused-const-variable]
drivers/isdn/capi/kcapi_proc.c:186:36: error: unused variable 'seq_applstats_ops' [-Werror,-Wunused-const-variable]

Remove the file from the build in that config and make the calls into
it conditional instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240403080702.3509288-27-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months ago3c515: remove unused 'mtu' variable
Arnd Bergmann [Wed, 3 Apr 2024 08:06:23 +0000 (10:06 +0200)]
3c515: remove unused 'mtu' variable

This has never been used since the start of the git history. When building
with W=1, the unused variable produces a gcc warning:

drivers/net/ethernet/3com/3c515.c:35:18: error: 'mtu' defined but not used [-Werror=unused-const-variable=]

Just remove it.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240403080702.3509288-6-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agotrace: events: cleanup deprecated strncpy uses
Justin Stitt [Mon, 1 Apr 2024 23:48:52 +0000 (23:48 +0000)]
trace: events: cleanup deprecated strncpy uses

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

For 2 out of 3 of these changes we can simply swap in strscpy() as it
guarantess NUL-termination which is needed for the following trace
print.

trace_rpcgss_context() should use memcpy as its format specifier %.*s
allows for the length to be specifier (__entry->len). Due to this,
acceptor does not technically need to be NUL-terminated. Moreover,
swapping in strscpy() and keeping everything else the same could result
in truncation of the source string by one byte. To remedy this, we could
use `len + 1` but I am unsure of the size of the destination buffer so a
simple memcpy should suffice.
| TP_printk("win_size=%u expiry=%lu now=%lu timeout=%u acceptor=%.*s",
| __entry->window_size, __entry->expiry, __entry->now,
| __entry->timeout, __entry->len, __get_str(acceptor))

I suspect acceptor not to naturally be a NUL-terminated string due to
the presence of some stringify methods.
| .crstringify_acceptor = gss_stringify_acceptor,

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20240401-strncpy-include-trace-events-mdio-h-v1-1-9cb5a4cda116@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge branch 'mlx5e-rc2-misc-patches'
Jakub Kicinski [Sat, 6 Apr 2024 04:54:46 +0000 (21:54 -0700)]
Merge branch 'mlx5e-rc2-misc-patches'

Tariq Toukan says:

====================
mlx5e rc2 misc patches (part)

This patchset includes small features and a cleanup for the mlx5e driver.

Patches 1-2 by Cosmin implements FEC settings for 100G/lane modes.

Patch 3 is a simple cleanup.
====================

Link: https://lore.kernel.org/r/20240404173357.123307-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet/mlx5e: Un-expose functions in en.h
Tariq Toukan [Thu, 4 Apr 2024 17:33:57 +0000 (20:33 +0300)]
net/mlx5e: Un-expose functions in en.h

Un-expose functions that are not used outside of their c file.
Make them static.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20240404173357.123307-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet/mlx5e: Support FEC settings for 100G/lane modes
Cosmin Ratiu [Thu, 4 Apr 2024 17:33:54 +0000 (20:33 +0300)]
net/mlx5e: Support FEC settings for 100G/lane modes

This consists of:
1. Expose the 100G/lane capability bit in the PCAM reg.
2. Expose the per link mode FEC capability masks in the PPLM reg.
3. Set the overrides according to ethtool parameters.
FEC for new modes is set if and only if the PCAM 100G/lane capability is
advertised and the capability mask for a given link mode reports that it
can accept the requested FEC mode.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240404173357.123307-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet/mlx5e: Extract checking of FEC support for a link mode
Cosmin Ratiu [Thu, 4 Apr 2024 17:33:53 +0000 (20:33 +0300)]
net/mlx5e: Extract checking of FEC support for a link mode

The check of whether a given FEC mode is supported in a given link mode
is about to get more complicated, so extract it in a separate function
to avoid code duplication.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240404173357.123307-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agobnxt_en: Fix PTP firmware timeout parameter
Michael Chan [Thu, 4 Apr 2024 19:55:00 +0000 (12:55 -0700)]
bnxt_en: Fix PTP firmware timeout parameter

Use the correct tmo_us microsecond parameter for the PTP firmware
timeout parameter.

Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()")
Reported-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240404195500.171071-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge branch 'net-dsa-microchip-ksz8-refactor-fdb-dump-path'
Jakub Kicinski [Fri, 5 Apr 2024 02:08:46 +0000 (19:08 -0700)]
Merge branch 'net-dsa-microchip-ksz8-refactor-fdb-dump-path'

Oleksij Rempel says:

====================
net: dsa: microchip: ksz8: refactor FDB dump path

Refactor FDB dump code path for Microchip KSZ8xxx series. This series
mostly makes some cosmetic reworks and allows to forward errors detected
by the regmap.

Change logs are part of patch commit messages.
====================

Link: https://lore.kernel.org/r/20240403125039.3414824-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: ksz8_r_dyn_mac_table(): use entries variable to signal 0 entries
Oleksij Rempel [Wed, 3 Apr 2024 12:50:39 +0000 (14:50 +0200)]
net: dsa: microchip: ksz8_r_dyn_mac_table(): use entries variable to signal 0 entries

We already have a variable to provide number of entries. So use it,
instead of using error number.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-9-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: ksz8_r_dyn_mac_table(): return read/write error if we got any
Oleksij Rempel [Wed, 3 Apr 2024 12:50:38 +0000 (14:50 +0200)]
net: dsa: microchip: ksz8_r_dyn_mac_table(): return read/write error if we got any

The read/write path may fail. So, return error if we got it.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-8-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: ksz8_r_dyn_mac_table(): ksz: do not return EAGAIN on timeout
Oleksij Rempel [Wed, 3 Apr 2024 12:50:37 +0000 (14:50 +0200)]
net: dsa: microchip: ksz8_r_dyn_mac_table(): ksz: do not return EAGAIN on timeout

EAGAIN was not used by previous code and not used by  current code. So,
remove it and use proper error value.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: ksz8: Unify variable naming in ksz8_r_dyn_mac_table()
Oleksij Rempel [Wed, 3 Apr 2024 12:50:36 +0000 (14:50 +0200)]
net: dsa: microchip: ksz8: Unify variable naming in ksz8_r_dyn_mac_table()

Use 'ret' instead of 'rc' in ksz8_r_dyn_mac_table() to maintain
consistency with the rest of the file.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-6-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: ksz8: Refactor ksz8_r_dyn_mac_table() for readability
Oleksij Rempel [Wed, 3 Apr 2024 12:50:35 +0000 (14:50 +0200)]
net: dsa: microchip: ksz8: Refactor ksz8_r_dyn_mac_table() for readability

Move the code out of a long if statement scope in ksz8_r_dyn_mac_table()
to improve code readability.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: ksz8: Refactor ksz8_fdb_dump()
Oleksij Rempel [Wed, 3 Apr 2024 12:50:34 +0000 (14:50 +0200)]
net: dsa: microchip: ksz8: Refactor ksz8_fdb_dump()

Refactor ksz8_fdb_dump() to address potential issues:
- Limit the number of iterations to avoid endless loops.
- Handle error codes returned by ksz8_r_dyn_mac_table(), with
  an exception for -ENXIO when no more dynamic entries are detected.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: Make ksz8_r_dyn_mac_table() static
Oleksij Rempel [Wed, 3 Apr 2024 12:50:33 +0000 (14:50 +0200)]
net: dsa: microchip: Make ksz8_r_dyn_mac_table() static

ksz8_r_dyn_mac_table() is not used outside the source file. Make it
static.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: dsa: microchip: Remove unused FDB timestamp support in ksz8_r_dyn_mac_table()
Oleksij Rempel [Wed, 3 Apr 2024 12:50:32 +0000 (14:50 +0200)]
net: dsa: microchip: Remove unused FDB timestamp support in ksz8_r_dyn_mac_table()

The FDB timestamps are not being utilized. This commit removes the
unused timestamp support from ksz8_r_dyn_mac_table() function.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240403125039.3414824-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge branch 'add-starfive-jh8100-dwmac-support'
Jakub Kicinski [Fri, 5 Apr 2024 02:07:46 +0000 (19:07 -0700)]
Merge branch 'add-starfive-jh8100-dwmac-support'

Tan Chun Hau says:

====================
Add StarFive JH8100 dwmac support

Add StarFive JH8100 dwmac support.
The JH8100 dwmac shares the same driver code as the JH7110 dwmac
and has only one reset signal.

Please refer to below:

  JH8100: reset-names = "stmmaceth";
  JH7110: reset-names = "stmmaceth", "ahb";
  JH7100: reset-names = "ahb";

Example usage of JH8100 in the device tree:

gmac0: ethernet@16030000 {
        compatible = "starfive,jh8100-dwmac",
                     "starfive,jh7110-dwmac",
                     "snps,dwmac-5.20";
        ...
};

Changes in v6:
- Removed unnecessary enum "starfive,jh8100-dwmac".
====================

Link: https://lore.kernel.org/r/20240403100549.78719-1-chunhau.tan@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agodt-bindings: net: starfive,jh7110-dwmac: Add StarFive JH8100 support
Tan Chun Hau [Wed, 3 Apr 2024 10:05:49 +0000 (03:05 -0700)]
dt-bindings: net: starfive,jh7110-dwmac: Add StarFive JH8100 support

Add StarFive JH8100 dwmac support.
The JH8100 dwmac shares the same driver code as the JH7110 dwmac
and has only one reset signal.

Please refer to below:

  JH8100: reset-names = "stmmaceth";
  JH7110: reset-names = "stmmaceth", "ahb";
  JH7100: reset-names = "ahb";

Example usage of JH8100 in the device tree:

gmac0: ethernet@16030000 {
        compatible = "starfive,jh8100-dwmac",
                     "starfive,jh7110-dwmac",
                     "snps,dwmac-5.20";
        ...
};

Signed-off-by: Tan Chun Hau <chunhau.tan@starfivetech.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240403100549.78719-2-chunhau.tan@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 5 Apr 2024 00:03:18 +0000 (17:03 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/ip_gre.c
  17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head")
  5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/

Adjacent changes:

net/ipv6/ip6_fib.c
  d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().")
  5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 4 Apr 2024 21:49:10 +0000 (14:49 -0700)]
Merge tag 'net-6.9-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, bluetooth and bpf.

  Fairly usual collection of driver and core fixes. The large selftest
  accompanying one of the fixes is also becoming a common occurrence.

  Current release - regressions:

   - ipv6: fix infinite recursion in fib6_dump_done()

   - net/rds: fix possible null-deref in newly added error path

  Current release - new code bugs:

   - net: do not consume a full cacheline for system_page_pool

   - bpf: fix bpf_arena-related file descriptor leaks in the verifier

   - drv: ice: fix freeing uninitialized pointers, fixing misuse of the
     newfangled __free() auto-cleanup

  Previous releases - regressions:

   - x86/bpf: fixes the BPF JIT with retbleed=stuff

   - xen-netfront: add missing skb_mark_for_recycle, fix page pool
     accounting leaks, revealed by recently added explicit warning

   - tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
     non-wildcard addresses

   - Bluetooth:
      - replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
        better workarounds to un-break some buggy Qualcomm devices
      - set conn encrypted before conn establishes, fix re-connecting to
        some headsets which use slightly unusual sequence of msgs

   - mptcp:
      - prevent BPF accessing lowat from a subflow socket
      - don't account accept() of non-MPC client as fallback to TCP

   - drv: mana: fix Rx DMA datasize and skb_over_panic

   - drv: i40e: fix VF MAC filter removal

  Previous releases - always broken:

   - gro: various fixes related to UDP tunnels - netns crossing
     problems, incorrect checksum conversions, and incorrect packet
     transformations which may lead to panics

   - bpf: support deferring bpf_link dealloc to after RCU grace period

   - nf_tables:
      - release batch on table validation from abort path
      - release mutex after nft_gc_seq_end from abort path
      - flush pending destroy work before exit_net release

   - drv: r8169: skip DASH fw status checks when DASH is disabled"

* tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  netfilter: validate user input for expected length
  net/sched: act_skbmod: prevent kernel-infoleak
  net: usb: ax88179_178a: avoid the interface always configured as random address
  net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
  net: ravb: Always update error counters
  net: ravb: Always process TX descriptor ring
  netfilter: nf_tables: discard table flag update with pending basechain deletion
  netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
  netfilter: nf_tables: reject new basechain after table flag update
  netfilter: nf_tables: flush pending destroy work before exit_net release
  netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
  netfilter: nf_tables: release batch on table validation from abort path
  Revert "tg3: Remove residual error handling in tg3_suspend"
  tg3: Remove residual error handling in tg3_suspend
  net: mana: Fix Rx DMA datasize and skb_over_panic
  net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
  net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
  net: stmmac: fix rx queue priority assignment
  net: txgbe: fix i2c dev name cannot match clkdev
  net: fec: Set mac_managed_pm during probe
  ...

14 months agoMerge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Thu, 4 Apr 2024 21:36:32 +0000 (14:36 -0700)]
Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs repair code from Kent Overstreet:
 "A couple more small fixes, and new repair code.

  We can now automatically recover from arbitrary corrupted interior
  btree nodes by scanning, and we can reconstruct metadata as needed to
  bring a filesystem back into a working, consistent, read-write state
  and preserve access to whatevver wasn't corrupted.

  Meaning - you can blow away all metadata except for extents and
  dirents leaf nodes, and repair will reconstruct everything else and
  give you your data, and under the correct paths. If inodes are missing
  i_size will be slightly off and permissions/ownership/timestamps will
  be gone, and we do still need the snapshots btree if snapshots were in
  use - in the future we'll be able to guess the snapshot tree structure
  in some situations.

  IOW - aside from shaking out remaining bugs (fuzz testing is still
  coming), repair code should be complete and if repair ever doesn't
  work that's the highest priority bug that I want to know about
  immediately.

  This patchset was kindly tested by a user from India who accidentally
  wiped one drive out of a three drive filesystem with no replication on
  the family computer - it took a couple weeks but we got everything
  important back"

* tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: reconstruct_inode()
  bcachefs: Subvolume reconstruction
  bcachefs: Check for extents that point to same space
  bcachefs: Reconstruct missing snapshot nodes
  bcachefs: Flag btrees with missing data
  bcachefs: Topology repair now uses nodes found by scanning to fill holes
  bcachefs: Repair pass for scanning for btree nodes
  bcachefs: Don't skip fake btree roots in fsck
  bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
  bcachefs: Etyzinger cleanups
  bcachefs: bch2_shoot_down_journal_keys()
  bcachefs: Clear recovery_passes_required as they complete without errors
  bcachefs: ratelimit informational fsck errors
  bcachefs: Check for bad needs_discard before doing discard
  bcachefs: Improve bch2_btree_update_to_text()
  mean_and_variance: Drop always failing tests
  bcachefs: fix nocow lock deadlock
  bcachefs: BCH_WATERMARK_interior_updates
  bcachefs: Fix btree node reserve

14 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 4 Apr 2024 18:37:39 +0000 (11:37 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-04-04

We've added 7 non-merge commits during the last 5 day(s) which contain
a total of 9 files changed, 75 insertions(+), 24 deletions(-).

The main changes are:

1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to
   incorrect destination IP calculation and incorrect IP for relocations,
   from Uros Bizjak and Joan Bruguera Micó.

2) Fix BPF arena file descriptor leaks in the verifier,
   from Anton Protopopov.

3) Defer bpf_link deallocation to after RCU grace period as currently
   running multi-{kprobes,uprobes} programs might still access cookie
   information from the link, from Andrii Nakryiko.

4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported
   by syzkaller, from Jakub Sitnicki.

5) Fix resolve_btfids build with musl libc due to missing linux/types.h
   include, from Natanael Copa.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, sockmap: Prevent lock inversion deadlock in map delete elem
  x86/bpf: Fix IP for relocating call depth accounting
  x86/bpf: Fix IP after emitting call depth accounting
  bpf: fix possible file descriptor leaks in verifier
  tools/resolve_btfids: fix build with musl libc
  bpf: support deferring bpf_link dealloc to after RCU grace period
  bpf: put uprobe link's path and task in release callback
====================

Link: https://lore.kernel.org/r/20240404183258.4401-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge branch 'selftests-net-groundwork-for-ynl-based-tests'
Jakub Kicinski [Thu, 4 Apr 2024 17:33:56 +0000 (10:33 -0700)]
Merge branch 'selftests-net-groundwork-for-ynl-based-tests'

Jakub Kicinski says:

====================
selftests: net: groundwork for YNL-based tests (YNL prep)

v1: https://lore.kernel.org/all/20240402010520.1209517-1-kuba@kernel.org/
====================

Merge the non-controversial YNL adjustment and spec additions.

Link: https://lore.kernel.org/r/20240403023426.1762996-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agotools: ynl: copy netlink error to NlError
Jakub Kicinski [Wed, 3 Apr 2024 02:34:21 +0000 (19:34 -0700)]
tools: ynl: copy netlink error to NlError

Typing e.nl_msg.error when processing exception is a bit tedious
and counter-intuitive. Set a local .error member to the positive
value of the netlink level error.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240403023426.1762996-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonetlink: specs: define ethtool header flags
Jakub Kicinski [Wed, 3 Apr 2024 02:34:20 +0000 (19:34 -0700)]
netlink: specs: define ethtool header flags

When interfacing with the ethtool commands it's handy to
be able to use the names of the flags. Example:

    ethnl.pause_get({"header": {"dev-index": cfg.ifindex,
                                "flags": {'stats'}}})

Note that not all commands accept all the flags,
but the meaning of the bits does not change command
to command.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240403023426.1762996-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonetfilter: validate user input for expected length
Eric Dumazet [Thu, 4 Apr 2024 12:20:51 +0000 (12:20 +0000)]
netfilter: validate user input for expected length

I got multiple syzbot reports showing old bugs exposed
by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc
in cgroup/{s,g}etsockopt")

setsockopt() @optlen argument should be taken into account
before copying data.

 BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
 BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
 BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
 BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238

CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
  print_address_description mm/kasan/report.c:377 [inline]
  print_report+0x169/0x550 mm/kasan/report.c:488
  kasan_report+0x143/0x180 mm/kasan/report.c:601
  kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
  __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
  copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
  copy_from_sockptr include/linux/sockptr.h:55 [inline]
  do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
  do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
  nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
  do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
  __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
  __do_sys_setsockopt net/socket.c:2343 [inline]
  __se_sys_setsockopt net/socket.c:2340 [inline]
  __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
 do_syscall_64+0xfb/0x240
 entry_SYSCALL_64_after_hwframe+0x72/0x7a
RIP: 0033:0x7fd22067dde9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
 </TASK>

Allocated by task 7238:
  kasan_save_stack mm/kasan/common.c:47 [inline]
  kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
  poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
  __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
  kasan_kmalloc include/linux/kasan.h:211 [inline]
  __do_kmalloc_node mm/slub.c:4069 [inline]
  __kmalloc_noprof+0x200/0x410 mm/slub.c:4082
  kmalloc_noprof include/linux/slab.h:664 [inline]
  __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
  do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
  __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
  __do_sys_setsockopt net/socket.c:2343 [inline]
  __se_sys_setsockopt net/socket.c:2340 [inline]
  __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
 do_syscall_64+0xfb/0x240
 entry_SYSCALL_64_after_hwframe+0x72/0x7a

The buggy address belongs to the object at ffff88802cd73da0
 which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes inside of
 allocated 1-byte region [ffff88802cd73da0ffff88802cd73da1)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
page_type: 0xffffefff(slab)
raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
  set_page_owner include/linux/page_owner.h:32 [inline]
  post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
  prep_new_page mm/page_alloc.c:1498 [inline]
  get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
  __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
  __alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
  alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
  alloc_slab_page+0x5f/0x120 mm/slub.c:2249
  allocate_slab+0x5a/0x2e0 mm/slub.c:2412
  new_slab mm/slub.c:2465 [inline]
  ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
  __slab_alloc+0x58/0xa0 mm/slub.c:3705
  __slab_alloc_node mm/slub.c:3758 [inline]
  slab_alloc_node mm/slub.c:3936 [inline]
  __do_kmalloc_node mm/slub.c:4068 [inline]
  kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
  kstrdup+0x3a/0x80 mm/util.c:62
  device_rename+0xb5/0x1b0 drivers/base/core.c:4558
  dev_change_name+0x275/0x860 net/core/dev.c:1232
  do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
  __rtnl_newlink net/core/rtnetlink.c:3680 [inline]
  rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
  rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
  netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
  netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
page last free pid 5146 tgid 5146 stack trace:
  reset_page_owner include/linux/page_owner.h:25 [inline]
  free_pages_prepare mm/page_alloc.c:1110 [inline]
  free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
  discard_slab mm/slub.c:2511 [inline]
  __put_partials+0xeb/0x130 mm/slub.c:2980
  put_cpu_partial+0x17c/0x250 mm/slub.c:3055
  __slab_free+0x2ea/0x3d0 mm/slub.c:4254
  qlink_free mm/kasan/quarantine.c:163 [inline]
  qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
  kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
  __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
  kasan_slab_alloc include/linux/kasan.h:201 [inline]
  slab_post_alloc_hook mm/slub.c:3888 [inline]
  slab_alloc_node mm/slub.c:3948 [inline]
  __do_kmalloc_node mm/slub.c:4068 [inline]
  __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
  kmalloc_node_noprof include/linux/slab.h:681 [inline]
  kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
  bucket_table_alloc lib/rhashtable.c:186 [inline]
  rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
  rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
  process_one_work kernel/workqueue.c:3218 [inline]
  process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
  worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
  kthread+0x2f0/0x390 kernel/kthread.c:388
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

Memory state around the buggy address:
 ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
 ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
>ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
                               ^
 ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
 ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 4 Apr 2024 16:38:52 +0000 (09:38 -0700)]
Merge tag 'nf-24-04-04' of git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 unlike early commit path stage which triggers a call to abort,
         an explicit release of the batch is required on abort, otherwise
         mutex is released and commit_list remains in place.

Patch #2 release mutex after nft_gc_seq_end() in commit path, otherwise
         async GC worker could collect expired objects.

Patch #3 flush pending destroy work in module removal path, otherwise UaF
         is possible.

Patch #4 and #6 restrict the table dormant flag with basechain updates
 to fix state inconsistency in the hook registration.

Patch #5 adds missing RCU read side lock to flowtable type to avoid races
 with module removal.

* tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: discard table flag update with pending basechain deletion
  netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
  netfilter: nf_tables: reject new basechain after table flag update
  netfilter: nf_tables: flush pending destroy work before exit_net release
  netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
  netfilter: nf_tables: release batch on table validation from abort path
====================

Link: https://lore.kernel.org/r/20240404104334.1627-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 4 Apr 2024 16:34:35 +0000 (09:34 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-04-03 (ice, idpf)

This series contains updates to ice and idpf drivers.

Dan Carpenter initializes some pointer declarations to NULL as needed for
resource cleanup on ice driver.

Petr Oros corrects assignment of VLAN operators to fix Rx VLAN filtering
in legacy mode for ice.

Joshua calls eth_type_trans() on unknown packets to prevent possible
kernel panic on idpf.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  idpf: fix kernel panic on unknown packet types
  ice: fix enabling RX VLAN filtering
  ice: Fix freeing uninitialized pointers
====================

Link: https://lore.kernel.org/r/20240403201929.1945116-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet/sched: act_skbmod: prevent kernel-infoleak
Eric Dumazet [Wed, 3 Apr 2024 13:09:08 +0000 (13:09 +0000)]
net/sched: act_skbmod: prevent kernel-infoleak

syzbot found that tcf_skbmod_dump() was copying four bytes
from kernel stack to user space [1].

The issue here is that 'struct tc_skbmod' has a four bytes hole.

We need to clear the structure before filling fields.

[1]
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
 BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline]
 BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline]
 BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
 BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline]
 BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
  instrument_copy_to_user include/linux/instrumented.h:114 [inline]
  copy_to_user_iter lib/iov_iter.c:24 [inline]
  iterate_ubuf include/linux/iov_iter.h:29 [inline]
  iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
  iterate_and_advance include/linux/iov_iter.h:271 [inline]
  _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
  copy_to_iter include/linux/uio.h:196 [inline]
  simple_copy_to_iter net/core/datagram.c:532 [inline]
  __skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420
  skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
  skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline]
  netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962
  sock_recvmsg_nosec net/socket.c:1046 [inline]
  sock_recvmsg+0x2c4/0x340 net/socket.c:1068
  __sys_recvfrom+0x35a/0x5f0 net/socket.c:2242
  __do_sys_recvfrom net/socket.c:2260 [inline]
  __se_sys_recvfrom net/socket.c:2256 [inline]
  __x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256
 do_syscall_64+0xd5/0x1f0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

Uninit was stored to memory at:
  pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253
  netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317
  netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351
  nlmsg_unicast include/net/netlink.h:1144 [inline]
  nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610
  rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741
  rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline]
  tcf_add_notify net/sched/act_api.c:2048 [inline]
  tcf_action_add net/sched/act_api.c:2071 [inline]
  tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119
  rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
  netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
  rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
  netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
  netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:745
  ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
  __sys_sendmsg net/socket.c:2667 [inline]
  __do_sys_sendmsg net/socket.c:2676 [inline]
  __se_sys_sendmsg net/socket.c:2674 [inline]
  __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
 do_syscall_64+0xd5/0x1f0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

Uninit was stored to memory at:
  __nla_put lib/nlattr.c:1041 [inline]
  nla_put+0x1c6/0x230 lib/nlattr.c:1099
  tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256
  tcf_action_dump_old net/sched/act_api.c:1191 [inline]
  tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
  tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251
  tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628
  tcf_add_notify_msg net/sched/act_api.c:2023 [inline]
  tcf_add_notify net/sched/act_api.c:2042 [inline]
  tcf_action_add net/sched/act_api.c:2071 [inline]
  tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119
  rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
  netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
  rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
  netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
  netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:745
  ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
  __sys_sendmsg net/socket.c:2667 [inline]
  __do_sys_sendmsg net/socket.c:2676 [inline]
  __se_sys_sendmsg net/socket.c:2674 [inline]
  __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
 do_syscall_64+0xd5/0x1f0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

Local variable opt created at:
  tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244
  tcf_action_dump_old net/sched/act_api.c:1191 [inline]
  tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227

Bytes 188-191 of 248 are uninitialized
Memory access of size 248 starts at ffff888117697680
Data copied to user address 00007ffe56d855f0

Fixes: 86da71b57383 ("net_sched: Introduce skbmod action")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240403130908.93421-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 months agonet: usb: ax88179_178a: avoid the interface always configured as random address
Jose Ignacio Tornos Martinez [Wed, 3 Apr 2024 13:21:58 +0000 (15:21 +0200)]
net: usb: ax88179_178a: avoid the interface always configured as random address

After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two
consecutive device resets"), reset is not executed from bind operation and
mac address is not read from the device registers or the devicetree at that
moment. Since the check to configure if the assigned mac address is random
or not for the interface, happens after the bind operation from
usbnet_probe, the interface keeps configured as random address, although the
address is correctly read and set during open operation (the only reset
now).

In order to keep only one reset for the device and to avoid the interface
always configured as random address, after reset, configure correctly the
suitable field from the driver, if the mac address is read successfully from
the device registers or the devicetree. Take into account if a locally
administered address (random) was previously stored.

cc: stable@vger.kernel.org # 6.6+
Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets")
Reported-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240403132158.344838-1-jtornosm@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>