git.maquefel.me Git - linux.git/log

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Conflicts:

net/ax25/af_ax25.c
  d7c4c9e075f8c ("ax25: fix incorrect dev_tracker usage")
  d62607c3fe459 ("net: rename reference+tracking helpers")

drivers/net/netdevsim/fib.c
  180a6a3ee60a ("netdevsim: fib: Fix reference count leak on route deletion failure")
  012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

doc: sfp-phylink: Fix a broken reference

The commit in Fixes: has changed a .txt file into a .yaml file. Update the
documentation accordingly.

While at it add some `` around some file names to improve the output.

Fixes: 70991f1e6858 ("dt-bindings: net: convert sff,sfp to dtschema")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/be3c7e87ca7f027703247eccfe000b8e34805094.1659247114.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'wireguard-patches-for-5-20-rc1'

Jason A. Donenfeld says:

====================
wireguard patches for 5.20-rc1

I had planned to send these out eventually as net.git patches, but as
you emailed earlier, I figure there's no harm in just doing this now for
net-next.git. Please apply the following small fixes:

1) Rather than using msleep() in order to approximate ktime_get_coarse_
   boottime_ns(), instead use an hrtimer, rounded heuristically.

2) An update in selftest config fragments, from Lukas.

3) Linus noticed that a debugging WARN_ON() to detect (impossible) stack
   corruption would still allow the corruption to happen, making it harder
   to get the report about the corruption subsequently.

4) Support for User Mode Linux in the test suite. This depends on some
   UML patches that are slated for 5.20. Richard hasn't sent his pull
   in, but they're in his tree, so I assume it'll happen.
====================

Link: https://lore.kernel.org/r/20220802125613.340848-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: selftests: support UML

This shoud open up various possibilities like time travel execution, and
is also just another platform to help shake out bugs.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: allowedips: don't corrupt stack when detecting overflow

In case push_rcu() and related functions are buggy, there's a
WARN_ON(len >= 128), which the selftest tries to hit by being tricky. In
case it is hit, we shouldn't corrupt the kernel's stack, though;
otherwise it may be hard to even receive the report that it's buggy. So
conditionalize the stack write based on that WARN_ON()'s return value.

Note that this never *actually* happens anyway. The WARN_ON() in the
first place is bounded by IS_ENABLED(DEBUG), and isn't expected to ever
actually hit. This is just a debugging sanity check.

Additionally, hoist the constant 128 into a named enum,
MAX_ALLOWEDIPS_BITS, so that it's clear why this value is chosen.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjJZGA6w_DxA+k7Ejbqsq+uGK==koPai3sqdsfJqemvag@mail.gmail.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: selftests: update config fragments

The kernel.config and debug.config fragments in wireguard selftests mention
some config symbols that have been reworked:

Commit c5665868183f ("mm: kmemleak: use the memory pool for early
allocations") removes the config DEBUG_KMEMLEAK_EARLY_LOG_SIZE and since
then, the config's feature is available without further configuration.

Commit 4675ff05de2d ("kmemcheck: rip it out") removes kmemcheck and the
corresponding arch config HAVE_ARCH_KMEMCHECK. There is no need for this
config.

Commit 3bf195ae6037 ("netfilter: nat: merge nf_nat_ipv4,6 into nat core")
removes the config NF_NAT_IPV4 and since then, the config's feature is
available without further configuration.

Commit 41a2901e7d22 ("rcu: Remove SPARSE_RCU_POINTER Kconfig option")
removes the config SPARSE_RCU_POINTER and since then, the config's feature
is enabled by default.

Commit dfb4357da6dd ("time: Remove CONFIG_TIMER_STATS") removes the feature
and config CONFIG_TIMER_STATS without any replacement.

Commit 3ca17b1f3628 ("lib/ubsan: remove null-pointer checks") removes the
check and config UBSAN_NULL without any replacement.

Adjust the config fragments to those changes in configs.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wireguard: ratelimiter: use hrtimer in selftest

Using msleep() is problematic because it's compared against
ratelimiter.c's ktime_get_coarse_boottime_ns(), which means on systems
with slow jiffies (such as UML's forced HZ=100), the result is
inaccurate. So switch to using schedule_hrtimeout().

However, hrtimer gives us access only to the traditional posix timers,
and none of the _COARSE variants. So now, rather than being too
imprecise like jiffies, it's too precise.

One solution would be to give it a large "range" value, but this will
still fire early on a loaded system. A better solution is to align the
timeout to the actual coarse timer, and then round up to the nearest
tick, plus change.

So add the timeout to the current coarse time, and then
schedule_hrtimer() until the absolute computed time.

This should hopefully reduce flakes in CI as well. Note that we keep the
retry loop in case the entire function is running behind, because the
test could still be scheduled out, by either the kernel or by the
hypervisor's kernel, in which case restarting the test and hoping to not
be scheduled out still helps.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ

Striding RQ uses MTT page mapping, where each page corresponds to an XSK
frame. MTT pages have alignment requirements, and XSK frames don't have
any alignment guarantees in the unaligned mode. Frames with improper
alignment must be discarded, otherwise the packet data will be written
at a wrong address.

Fixes: 282c0c798f8e ("net/mlx5e: Allow XSK frames smaller than a page")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20220729121356.3990867-1-maximmi@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: ax88179_178a: Bind only to vendor-specific interface

The Anker PowerExpand USB-C to Gigabit Ethernet adapter uses this
chipset, but exposes CDC Ethernet configurations as well as the
vendor specific one. This driver tries to bind by PID:VID
unconditionally and ends up picking up the CDC configuration, which
is supposed to be handled by the class driver. To make things even
more confusing, it sees both of the CDC class interfaces and tries
to bind twice, resulting in two broken Ethernet devices.

Change all the ID matches to specifically match the vendor-specific
interface. By default the device comes up in CDC mode and is bound by
that driver (which works fine); users may switch it to the vendor
interface using sysfs to set bConfigurationValue, at which point the
device actually goes through a reconnect cycle and comes back as a
vendor specific only device, and then this driver binds and works too.

The affected device uses VID/PID 0b95:1790, but we might as well change
all of them for good measure, since there is no good reason for this
driver to bind to standard CDC Ethernet interfaces.

v3: Added VID/PID info to commit message

Signed-off-by: Hector Martin <marcan@marcan.st>
Link: https://lore.kernel.org/r/20220731072209.45504-1-marcan@marcan.st
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: fix IOAM test skip return code

The ioam6.sh test script exits with an error code (1) when tests are
skipped due to lack of support from userspace/kernel or not enough
permissions. It should return the kselftests SKIP code instead.

Reviewed-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Link: https://lore.kernel.org/r/20220801124615.256416-1-kleber.souza@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: make USB_RTL8153_ECM non user configurable

This refixes:

    commit 7da17624e7948d5d9660b910f8079d26d26ce453
    nt: usb: USB_RTL8153_ECM should not default to y

    In general, device drivers should not be enabled by default.

which basically broke the commit it claimed to fix, ie:

    commit 657bc1d10bfc23ac06d5d687ce45826c760744f9
    r8153_ecm: avoid to be prior to r8152 driver

    Avoid r8153_ecm is compiled as built-in, if r8152 driver is compiled
    as modules. Otherwise, the r8153_ecm would be used, even though the
    device is supported by r8152 driver.

this commit amounted to:

drivers/net/usb/Kconfig:

+config USB_RTL8153_ECM
+       tristate "RTL8153 ECM support"
+       depends on USB_NET_CDCETHER && (USB_RTL8152 || USB_RTL8152=n)
+       default y
+       help
+         This option supports ECM mode for RTL8153 ethernet adapter, when
+         CONFIG_USB_RTL8152 is not set, or the RTL8153 device is not
+         supported by r8152 driver.

drivers/net/usb/Makefile:

-obj-$(CONFIG_USB_NET_CDCETHER) += cdc_ether.o r8153_ecm.o
+obj-$(CONFIG_USB_NET_CDCETHER) += cdc_ether.o
+obj-$(CONFIG_USB_RTL8153_ECM)  += r8153_ecm.o

And as can be seen it pulls a piece of the cdc_ether driver out into
a separate config option to be able to make this piece modular in case
cdc_ether is builtin, while r8152 is modular.

While in general, device drivers should indeed not be enabled by default:
this isn't a device driver per say, but rather this is support code for
the CDCETHER (ECM) driver, and should thus be enabled if it is enabled.

See also email thread at:
  https://www.spinics.net/lists/netdev/msg767649.html

In:
  https://www.spinics.net/lists/netdev/msg768284.html

Jakub wrote:
  And when we say "removed" we can just hide it from what's prompted
  to the user (whatever such internal options are called)? I believe
  this way we don't bring back Marek's complaint.

Side note: these incorrect defaults will result in Android 13
on 5.15 GKI kernels lacking USB_RTL8153_ECM support while having
USB_NET_CDCETHER (luckily we also have USB_RTL8150 and USB_RTL8152,
so it's probably only an issue for very new RTL815x hardware with
no native 5.15 driver).

Fixes: 7da17624e7948d5d ("nt: usb: USB_RTL8153_ECM should not default to y")
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hayes Wang <hayeswang@realtek.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20220730230113.4138858-1-zenczykowski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: marvell: prestera: remove reduntant code

Fixes the coverity warning 'EVALUATION_ORDER' violation. port is written
twice with the same value.

Signed-off-by: Sebin Sebastian <mailmesebin00@gmail.com>
Link: https://lore.kernel.org/r/20220801040731.34741-1-mailmesebin00@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-pf: Reduce minimum mtu size to 60

PTP messages like SYNC, FOLLOW_UP, DELAY_REQ are of size 58 bytes.
Using a minimum packet length as 64 makes NIX to pad 6 bytes of
zeroes while transmission. This is causing latest ptp4l application to
emit errors since length in PTP header and received packet are not same.
Padding upto 3 bytes is fine but more than that makes ptp4l to assume
the pad bytes as a TLV. Hence reduce the size to 60 from 64.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20220729092457.3850-1-naveenm@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: Fix missing mutex_unlock() call

Commit 2dec18ad826f forgets to call mutex_unlock() before the function
returns in the error path:

   New smatch warnings:
   net/core/devlink.c:6392 devlink_nl_cmd_region_new() warn: inconsistent \
   returns '&region->snapshot_lock'.

Make sure we call mutex_unlock() in this error path.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 2dec18ad826f ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220801115742.1309329-1-ammar.faizi@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tls: Remove redundant workqueue flush before destroy

destroy_workqueue() safely destroys the workqueue after draining it.
No need for the explicit call to flush_workqueue(). Remove it.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220801112444.26175-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Fix an error handling path in txgbe_probe()

A pci_enable_pcie_error_reporting() should be balanced by a corresponding
pci_disable_pcie_error_reporting() call in the error handling path, as
already done in the remove function.

Fixes: 3ce7547e5b71 ("net: txgbe: Add build support for txgbe")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://lore.kernel.org/r/082003d00be1f05578c9c6434272ceb314609b8e.1659285240.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: Fix spelling mistakes and cleanup code

fix follow spelling misktakes:
desconstructed ==> deconstructed
enforcment ==> enforcement

Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: Xie Shaowen <studentxswpy@163.com>
Link: https://lore.kernel.org/r/20220730092254.3102875-1-studentxswpy@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Documentation: devlink: add add devlink-selftests to the table of contents

Commit 08f588fa301bef ("devlink: introduce framework for selftests") adds
documentation for devlink selftests framework, but it is missing from
table of contents.

Add it.

Link: https://lore.kernel.org/linux-doc/202207300406.CUBuyN5i-lkp@intel.com/
Fixes: 08f588fa301bef ("devlink: introduce framework for selftests")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220730022058.16813-1-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock

In the case of sk->dccps_qpolicy == DCCPQ_POLICY_PRIO, dccp_qpolicy_full
will drop a skb when qpolicy is full. And the lock in dccp_sendmsg is
released before sock_alloc_send_skb and then relocked after
sock_alloc_send_skb. The following conditions may lead dccp_qpolicy_push
to add skb to an already full sk_write_queue:

thread1--->lock
thread1--->dccp_qpolicy_full: queue is full. drop a skb
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_full: queue is not full. no need to drop.
thread2--->unlock
thread1--->lock
thread1--->dccp_qpolicy_push: add a skb. queue is full.
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_push: add a skb!
thread2--->unlock

Fix this by moving dccp_qpolicy_full.

Fixes: b1308dc015eb ("[DCCP]: Set TX Queue Length Bounds via Sysctl")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220729110027.40569-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-fix-using-wrong-flags-to-check-features'

Guangbin Huang says:

====================
net: fix using wrong flags to check features

We find that some drivers may use wrong flags to check features,
so fix them.
====================

Link: https://lore.kernel.org/r/20220729101755.4798-1-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ionic: fix error check for vlan flags in ionic_set_nic_features()

The prototype of input features of ionic_set_nic_features() is
netdev_features_t, but the vlan_flags is using the private
definition of ionic drivers. It should use the variable
ctx.cmd.lif_setattr.features, rather than features to check
the vlan flags. So fixes it.

Fixes: beead698b173 ("ionic: Add the basic NDO callbacks for netdev support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()

vsi->current_netdev_flags is used store the current net device
flags, not the active netdevice features. So it should use
vsi->netdev->featurs, rather than vsi->current_netdev_flags
to check NETIF_F_HW_VLAN_CTAG_FILTER.

Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: flower: add support for tunnel offload without key ID

Currently nfp driver will reject to offload tunnel key action without
tunnel key ID which means tunnel ID is 0. But it is a normal case for tc
flower since user can setup a tunnel with tunnel ID is 0.

So we need to support this case to accept tunnel key action without
tunnel key ID.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220729091641.354748-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-rose-fix-module-unload-issues'

Eric Dumazet says:

====================
net: rose: fix module unload issues

Bernard Pidoux reported that unloading rose module could lead
to infamous "unregistered_netdevice:" issues.

First patch is the fix, stable candidate.
Second patch is adding netdev ref tracker to af_rose.

I chose net-next to not inflict merge conflicts, because
Jakub changed dev_put_track() to netdev_put_track() in net-next.
====================

Link: https://lore.kernel.org/r/20220729091233.1030680-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rose: add netdev ref tracker to 'struct rose_sock'

This will help debugging netdevice refcount problems with
CONFIG_NET_DEV_REFCNT_TRACKER=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rose: fix netdev reference changes

Bernard reported that trying to unload rose module would lead
to infamous messages:

unregistered_netdevice: waiting for rose0 to become free. Usage count = xx

This patch solves the issue, by making sure each socket referring to
a netdevice holds a reference count on it, and properly releases it
in rose_release().

rose_dev_first() is also fixed to take a device reference
before leaving the rcu_read_locked section.

Following patch will add ref_tracker annotations to ease
future bug hunting.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2022-07-28

Jacob Keller says:

Convert all of the Intel drivers with PTP support to the newer .adjfine
implementation which uses scaled parts per million.

This improves the precision of the frequency adjustments by taking advantage
of the full scaled parts per million input coming from user space.

In addition, all implementations are converted to using the
mul_u64_u64_div_u64 function which better handles the intermediate value.
This function supports architecture specific instructions where possible to
avoid loss of precision if the normal 64-bit multiplication would overflow.

Of note, the i40e implementation is now able to avoid loss of precision on
slower link speeds by taking advantage of this to multiply by the link speed
factor first. This results in a significantly more precise adjustment by
allowing the calculation to impact the lower bits.

This also gets us a step closer to being able to remove the .adjfreq
entirely by removing its use from many drivers.

I plan to follow this up with a series to update the drivers from other
vendors and drop the .adjfreq implementation entirely.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igb: convert .adjfreq to .adjfine
  ixgbe: convert .adjfreq to .adjfine
  i40e: convert .adjfreq to .adjfine
  i40e: use mul_u64_u64_div_u64 for PTP frequency calculation
  e1000e: convert .adjfreq to .adjfine
  e1000e: remove unnecessary range check in e1000e_phc_adjfreq
  ice: implement adjfine with mul_u64_u64_div_u64
====================

Link: https://lore.kernel.org/r/20220728181836.3387862-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: fsl,fec: Add i.MX8ULP FEC items

Add fsl,imx8ulp-fec for i.MX8ULP platform.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/all/20220726143853.23709-2-wei.fang@nxp.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'funeth-tx-xdp-frags'

Dimitris Michailidis says:

====================
net/funeth: Tx support for XDP with frags

Support XDP with fragments for XDP_TX and ndo_xdp_xmit.

The first three patches rework existing code used by the skb path to
make it suitable also for XDP. With these all the callees of the main
Tx XDP function, fun_xdp_tx(), are fragment-capable. The last patch
updates fun_xdp_tx() to handle fragments.
====================

Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/funeth: Tx handling of XDP with fragments.

By now all the functions fun_xdp_tx() calls are shared with the skb path
and thus are fragment-capable. Update fun_xdp_tx(), that up to now has
been passing just one buffer, to check for fragments and call
accordingly. This makes XDP_TX and ndo_xdp_xmit fragment-capable.

Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/funeth: Unify skb/XDP packet mapping.

Instead of passing an skb to the mapping function pass an
skb_shared_info plus an additional address/length pair. This makes it
usable for both skbs and XDP. Call it from the XDP path and adjust the
skb path.

Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/funeth: Unify skb/XDP gather list writing.

Extract the Tx gather list writing code that skbs use into a utility
function and use it also for XDP.

Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/funeth: Unify skb/XDP Tx packet unmapping.

Current XDP unmapping is a subset of its skb analog, dealing with
only one buffer. In preparation for multi-frag XDP rename the skb
function and use it also for XDP. The XDP version is removed.

Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'devlink-parallel-commands'

Jiri Pirko says:

====================
net: devlink: allow parallel commands on multiple devlinks

Aim of this patchset is to remove devlink_mutex and eventually to enable
parallel ops on devlink netlink interface.
====================

Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devlink: enable parallel ops on netlink interface

As the devlink_mutex was removed and all devlink instances are protected
individually by devlink->lock mutex, allow the netlink ops to run
in parallel and therefore allow user to execute commands on multiple
devlink instances simultaneously.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devlink: remove devlink_mutex

All accesses to devlink structure from userspace and drivers are locked
with devlink->lock instance mutex. Also, devlinks xa_array iteration is
taken care of by iteration helpers taking devlink reference.

Therefore, remove devlink_mutex as it is no longer needed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devlink: convert reload command to take implicit devlink->lock

Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devlink: introduce "unregistering" mark and use it during devlinks iteration

Add new mark called "unregistering" to be set at the beginning of
devlink_unregister() function. Check this mark during devlinks
iteration in order to prevent getting a reference of devlink which is
being currently unregistered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Remove redundant __udp_sysctl_init() call from udp_init().

__udp_sysctl_init() is called for init_net via udp_sysctl_ops.

While at it, we can rename __udp_sysctl_init() to udp_sysctl_init().

Fixes: 1e8029515816 ("udp: Move the udp sysctl to namespace.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-07-29

This series contains updates to iavf driver only.

Przemyslaw prevents setting of TC max rate below minimum supported values
and reports updated queue values when setting up TCs.
---
v2: Dropped patch 3 (hw-tc-offload check)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/rds: Use PTR_ERR instead of IS_ERR for rdsdebug()

If 'local_odp_mr->r_trans_private' is a error code,
it is better to print the error code than to print
the value of IS_ERR().

Signed-off-by: Li Qiong <liqiong@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-5.20-20220731' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:

====================
pull-request: can-next 2022-07-31

this is a pull request of 36 patches for net-next/master.

The 1st patch is by me and fixes a typo in the mcp251xfd driver.

Vincent Mailhol contributes a series of 9 patches, which clean up the
drivers to make use of KBUILD_MODNAME instead of hard coded names and
remove DRV_VERSION.

Followed by 3 patches by Vincent Mailhol that directly set the
ethtool_ops in instead of calling a function in the slcan, c_can and
flexcan driver.

Vincent Mailhol contributes a KBUILD_MODNAME and pr_fmt cleanup patch
for the slcan driver. Dario Binacchi contributes 6 patches to clean up
the driver and remove the legacy driver infrastructure.

The next 14 patches are by Vincent Mailhol and target the various
drivers, they add ethtool support and reporting of timestamping
capabilities.

Another patch by Vincent Mailhol for the etas_es58x driver to remove
useless calls to usb_fill_bulk_urb().

The last patch is by Christophe JAILLET and fixes a broken link to
Documentation in the can327 driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

can: can327: fix a broken link to Documentation

Since commit 482a4360c56a ("docs: networking: convert netdevices.txt to
ReST"), Documentation/networking/netdevices.txt has been replaced by
Documentation/networking/netdevices.rst.

Update the comment accordingly to avoid a 'make htmldocs' warning.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/all/6a54aff884ea4f84b661527d75aabd6632140715.1659249135.git.christophe.jaillet@wanadoo.fr
Fixes: 43da2f07622f ("can: can327: CAN/ldisc driver for ELM327 based OBD-II adapters")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'mlx5-updates-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-07-28

Misc updates to mlx5 driver:

1) Gal corrects to use skb_tcp_all_headers on encapsulated skbs.

2) Roi Adds the support for offloading standalone police actions.

3) lama, did some refactoring to minimize code coupling with
mlx5e_priv "god object" in some of the follows, and converts some of the
objects to pointers to preserve on memory when these objects aren't needed.
This is part one of two parts series.

* tag 'mlx5-updates-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Move mlx5e_init_l2_addr to en_main
  net/mlx5e: Split en_fs ndo's and move to en_main
  net/mlx5e: Separate mlx5e_set_rx_mode_work and move caller to en_main
  net/mlx5e: Add mdev to flow_steering struct
  net/mlx5e: Report flow steering errors with mdev err report API
  net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer
  net/mlx5e: Allocate VLAN and TC for featured profiles only
  net/mlx5e: Make mlx5e_tc_table private
  net/mlx5e: Convert mlx5e_tc_table member of mlx5e_flow_steering to pointer
  net/mlx5e: TC, Support tc action api for police
  net/mlx5e: TC, Separate get/update/replace meter functions
  net/mlx5e: Add red and green counters for metering
  net/mlx5e: TC, Allocate post meter ft per rule
  net/mlx5: DR, Add support for flow metering ASO
  net/mlx5e: Fix wrong use of skb_tcp_all_headers() with encapsulation
====================

Link: https://lore.kernel.org/r/20220728205728.143074-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-fixes-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-07-28

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Fix driver use of uninitialized timeout
  net/mlx5: DR, Fix SMFS steering info dump format
  net/mlx5: Adjust log_max_qp to be 18 at most
  net/mlx5e: Modify slow path rules to go to slow fdb
  net/mlx5e: Fix calculations related to max MPWQE size
  net/mlx5e: xsk: Account for XSK RQ UMRs when calculating ICOSQ size
  net/mlx5e: Fix the value of MLX5E_MAX_RQ_NUM_MTTS
  net/mlx5e: TC, Fix post_act to not match on in_port metadata
  net/mlx5e: Remove WARN_ON when trying to offload an unsupported TLS cipher/version
====================

Link: https://lore.kernel.org/r/20220728204640.139990-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-07-28

This series contains updates to ice driver only.

Michal allows for VF true promiscuous mode to be set for multiple VFs
and adds clearing of promiscuous filters when VF trust is removed.

Maciej refactors ice_set_features() to track/check changed features
instead of constantly checking against netdev features and adds support for
NETIF_F_LOOPBACK.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: allow toggling loopback mode via ndo_set_features callback
  ice: compress branches in ice_set_features()
  ice: Fix promiscuous mode not turning off
  ice: Introduce enabling promiscuous mode on multiple VF's
====================

Link: https://lore.kernel.org/r/20220728195538.3391360-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'sfc-vf-representors-for-ef100-rx-side'

Edward Cree says:

====================
sfc: VF representors for EF100 - RX side

This series adds the receive path for EF100 VF representors, plus other
minor features such as statistics.
====================

Link: https://lore.kernel.org/r/cover.1659034549.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: implement ethtool get/set RX ring size for EF100 reps

It's not truly a ring, but the maximum length of the list of queued RX
SKBs is analogous to an RX ring size, so use that API to configure it.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: use a dynamic m-port for representor RX and set it promisc

Representors do not want to be subject to the PF's Ethernet address
filters, since traffic from VFs will typically have a destination
either elsewhere on the link segment or on an overlay network.
So, create a dynamic m-port with promiscuous and all-multicast
filters, and set it as the egress port of representor default rules.
Since the m-port is an alias of the calling PF's own m-port, traffic
will still be delivered to the PF's RXQs, but it will be subject to
the VNRX filter rules installed on the dynamic m-port (specified by
the v-port ID field of the filter spec).

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: move table locking into filter_table_{probe,remove} methods

We need to be able to drop the efx->filter_sem in ef100_filter_table_up()
so that we can call functions that insert filters (and thus take that
rwsem for read), which means the efx->type->filter_table_probe method
needs to be responsible for taking the lock in the first place.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: insert default MAE rules to connect VFs to representors

Default rules are low-priority switching rules which the hardware uses
in the absence of higher-priority rules. Each representor requires a
corresponding rule matching traffic from its representee VF and
delivering to the PF (where a check on INGRESS_MPORT in
__ef100_rx_packet() will direct it to the representor). No rule is
required in the reverse direction, because representor TX uses a TX
override descriptor to bypass the MAE and deliver directly to the VF.
Since inserting any rule into the MAE disables the firmware's own
default rules, also insert a pair of rules to connect the PF to the
physical network port and vice-versa.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: receive packets from EF100 VFs into representors

If the source m-port of a packet in __ef100_rx_packet() is a VF,
hand off the packet to the corresponding representor with
efx_ef100_rep_rx_packet().

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: check ef100 RX packets are from the wire

If not, for now drop them and warn. A subsequent patch will look up
the source m-port to try and find a representor to deliver them to.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: determine wire m-port at EF100 PF probe time

Traffic delivered to the (MAE admin) PF could be from either the wire
or a VF. The INGRESS_MPORT field of the RX prefix distinguishes these;
base_mport is the value this field will have for traffic from the wire
(which should be delivered to the PF's netdevice, not a representor).

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: ef100 representor RX top half

Representor RX uses a NAPI context driven by a 'fake interrupt': when
the parent PF receives a packet destined for the representor, it adds
it to an SKB list (efv->rx_list), and schedules NAPI if the 'fake
interrupt' is primed. The NAPI poll then pulls packets off this list
and feeds them to the stack with netif_receive_skb_list().
This scheme allows us to decouple representor RX from the parent PF's
RX fast-path.
This patch implements the 'top half', which builds an SKB, copies data
into it from the RX buffer (which can then be released), adds it to
the queue and fires the 'fake interrupt' if necessary.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: ef100 representor RX NAPI poll

This patch adds the 'bottom half' napi->poll routine for representor RX.
See the next patch (with the top half) for an explanation of the 'fake
interrupt' scheme used to drive this NAPI context.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: plumb ef100 representor stats

Implement .ndo_get_stats64() method to read values out of struct
efx_rep_sw_stats.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: marvell: prestera: uninitialized variable bug

The "ret" variable needs to be initialized at the start.

Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YuKeBBuGtsmd7QdT@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dn_route: replace "jiffies-now>0" with "jiffies!=now"

Use "jiffies != now" to replace "jiffies - now > 0" to make
code more readable. We want to put a limit on how long the
loop can run for before rescheduling.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20220729061712.22666-1-yuzhe@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v5.20

Fourth set of patches for v5.20, last few patches before the merge
window. Only driver changes this time, mostly just fixes and cleanup.

Major changes:

brcmfmac
- support brcm,ccode-map-trivial DT property

wcn36xx
- add debugfs file to show firmware feature strings

* tag 'wireless-next-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (36 commits)
  wifi: rtw88: check the return value of alloc_workqueue()
  wifi: rtw89: 8852a: adjust IMR for SER L1
  wifi: rtw89: 8852a: update RF radio A/B R56
  wifi: wcn36xx: Add debugfs entry to read firmware feature strings
  wifi: wcn36xx: Move capability bitmap to string translation function to firmware.c
  wifi: wcn36xx: Move firmware feature bit storage to dedicated firmware.c file
  wifi: wcn36xx: Rename clunky firmware feature bit enum
  wifi: brcmfmac: prevent double-free on hardware-reset
  wifi: brcmfmac: support brcm,ccode-map-trivial DT property
  dt-bindings: bcm4329-fmac: add optional brcm,ccode-map-trivial
  wifi: brcmfmac: Replace default (not configured) MAC with a random MAC
  wifi: brcmfmac: Add brcmf_c_set_cur_etheraddr() helper
  wifi: brcmfmac: Remove #ifdef guards for PM related functions
  wifi: brcmfmac: use strreplace() in brcmf_of_probe()
  wifi: plfxlc: Use eth_zero_addr() to assign zero address
  wifi: wilc1000: use existing iftype variable to store the interface type
  wifi: wilc1000: add 'isinit' flag for SDIO bus similar to SPI
  wifi: wilc1000: cancel the connect operation during interface down
  wifi: wilc1000: get correct length of string WID from received config packet
  wifi: wilc1000: set station_info flag only when signal value is valid
  ...
====================

Link: https://lore.kernel.org/r/20220729192832.A5011C433D6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Andrii Nakryiko says:

====================
bpf-next 2022-07-29

We've added 22 non-merge commits during the last 4 day(s) which contain
a total of 27 files changed, 763 insertions(+), 120 deletions(-).

The main changes are:

1) Fixes to allow setting any source IP with bpf_skb_set_tunnel_key() helper,
   from Paul Chaignon.

2) Fix for bpf_xdp_pointer() helper when doing sanity checking, from Joanne Koong.

3) Fix for XDP frame length calculation, from Lorenzo Bianconi.

4) Libbpf BPF_KSYSCALL docs improvements and fixes to selftests to accommodate
   s390x quirks with socketcall(), from Ilya Leoshkevich.

5) Allow/denylist and CI configs additions to selftests/bpf to improve BPF CI,
   from Daniel Müller.

6) BPF trampoline + ftrace follow up fixes, from Song Liu and Xu Kuohai.

7) Fix allocation warnings in netdevsim, from Jakub Kicinski.

8) bpf_obj_get_opts() libbpf API allowing to provide file flags, from Joe Burton.

9) vsnprintf usage fix in bpf_snprintf_btf(), from Fedor Tokarev.

10) Various small fixes and clean ups, from Daniel Müller, Rongguang Wei,
    Jörn-Thorben Hinz, Yang Li.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (22 commits)
  bpf: Remove unneeded semicolon
  libbpf: Add bpf_obj_get_opts()
  netdevsim: Avoid allocation warnings triggered from user space
  bpf: Fix NULL pointer dereference when registering bpf trampoline
  bpf: Fix test_progs -j error with fentry/fexit tests
  selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout
  bpftool: Don't try to return value from void function in skeleton
  bpftool: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE macro
  bpf: btf: Fix vsnprintf return value check
  libbpf: Support PPC in arch_specific_syscall_pfx
  selftests/bpf: Adjust vmtest.sh to use local kernel configuration
  selftests/bpf: Copy over libbpf configs
  selftests/bpf: Sort configuration
  selftests/bpf: Attach to socketcall() in test_probe_user
  libbpf: Extend BPF_KSYSCALL documentation
  bpf, devmap: Compute proper xdp_frame len redirecting frames
  bpf: Fix bpf_xdp_pointer return pointer
  selftests/bpf: Don't assign outer source IP to host
  bpf: Set flow flag to allow any source IP in bpf_tunnel_key
  geneve: Use ip_tunnel_key flow flags in route lookups
  ...
====================

Link: https://lore.kernel.org/r/20220729230948.1313527-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bpf: Remove unneeded semicolon

Eliminate the following coccicheck warning:
/kernel/bpf/trampoline.c:101:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220725222733.55613-1-yang.lee@linux.alibaba.com

libbpf: Add bpf_obj_get_opts()

Add an extensible variant of bpf_obj_get() capable of setting the
`file_flags` parameter.

This parameter is needed to enable unprivileged access to BPF maps.
Without a method like this, users must manually make the syscall.

Signed-off-by: Joe Burton <jevburton@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220729202727.3311806-1-jevburton.kernel@gmail.com

netdevsim: Avoid allocation warnings triggered from user space

We need to suppress warnings from sily map sizes. Also switch
from GFP_USER to GFP_KERNEL_ACCOUNT, I'm pretty sure I misunderstood
the flags when writing this code.

Fixes: 395cacb5f1a0 ("netdevsim: bpf: support fake map offload")
Reported-by: syzbot+ad24705d3fd6463b18c6@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220726213605.154204-1-kuba@kernel.org

bpf: Fix NULL pointer dereference when registering bpf trampoline

A panic was reported on arm64:

[   44.517109] audit: type=1334 audit(1658859870.268:59): prog-id=19 op=LOAD
[   44.622031] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000010
[   44.624321] Mem abort info:
[   44.625049]   ESR = 0x0000000096000004
[   44.625935]   EC = 0x25: DABT (current EL), IL = 32 bits
[   44.627182]   SET = 0, FnV = 0
[   44.627930]   EA = 0, S1PTW = 0
[   44.628684]   FSC = 0x04: level 0 translation fault
[   44.629788] Data abort info:
[   44.630474]   ISV = 0, ISS = 0x00000004
[   44.631362]   CM = 0, WnR = 0
[   44.632041] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100ab5000
[   44.633494] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
[   44.635202] Internal error: Oops: 96000004 [#1] SMP
[   44.636452] Modules linked in: xfs crct10dif_ce ghash_ce virtio_blk
virtio_console virtio_mmio qemu_fw_cfg
[   44.638713] CPU: 2 PID: 1 Comm: systemd Not tainted 5.19.0-rc7 #1
[   44.640164] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[   44.641799] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   44.643404] pc : ftrace_set_filter_ip+0x24/0xa0
[   44.644659] lr : bpf_trampoline_update.constprop.0+0x428/0x4a0
[   44.646118] sp : ffff80000803b9f0
[   44.646950] x29: ffff80000803b9f0 x28: ffff0b5d80364400 x27: ffff80000803bb48
[   44.648721] x26: ffff8000085ad000 x25: ffff0b5d809d2400 x24: 0000000000000000
[   44.650493] x23: 00000000ffffffed x22: ffff0b5dd7ea0900 x21: 0000000000000000
[   44.652279] x20: 0000000000000000 x19: 0000000000000000 x18: ffffffffffffffff
[   44.654067] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
[   44.655787] x14: ffff0b5d809d2498 x13: ffff0b5d809d2432 x12: 0000000005f5e100
[   44.657535] x11: abcc77118461cefd x10: 000000000000005f x9 : ffffa7219cb5b190
[   44.659254] x8 : ffffa7219c8e0000 x7 : 0000000000000000 x6 : ffffa7219db075e0
[   44.661066] x5 : ffffa7219d3130e0 x4 : ffffa7219cab9da0 x3 : 0000000000000000
[   44.662837] x2 : 0000000000000000 x1 : ffffa7219cb7a5c0 x0 : 0000000000000000
[   44.664675] Call trace:
[   44.665274]  ftrace_set_filter_ip+0x24/0xa0
[   44.666327]  bpf_trampoline_update.constprop.0+0x428/0x4a0
[   44.667696]  __bpf_trampoline_link_prog+0xcc/0x1c0
[   44.668834]  bpf_trampoline_link_prog+0x40/0x64
[   44.669919]  bpf_tracing_prog_attach+0x120/0x490
[   44.671011]  link_create+0xe0/0x2b0
[   44.671869]  __sys_bpf+0x484/0xd30
[   44.672706]  __arm64_sys_bpf+0x30/0x40
[   44.673678]  invoke_syscall+0x78/0x100
[   44.674623]  el0_svc_common.constprop.0+0x4c/0xf4
[   44.675783]  do_el0_svc+0x38/0x4c
[   44.676624]  el0_svc+0x34/0x100
[   44.677429]  el0t_64_sync_handler+0x11c/0x150
[   44.678532]  el0t_64_sync+0x190/0x194
[   44.679439] Code: 2a0203f4 f90013f5 2a0303f5 f9001fe1 (f9400800)
[   44.680959] ---[ end trace 0000000000000000 ]---
[   44.682111] Kernel panic - not syncing: Oops: Fatal exception
[   44.683488] SMP: stopping secondary CPUs
[   44.684551] Kernel Offset: 0x2721948e0000 from 0xffff800008000000
[   44.686095] PHYS_OFFSET: 0xfffff4a380000000
[   44.687144] CPU features: 0x010,00022811,19001080
[   44.688308] Memory Limit: none
[   44.689082] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

It's caused by a NULL tr->fops passed to ftrace_set_filter_ip(). tr->fops
is initialized to NULL and is assigned to an allocated memory address if
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is enabled. Since there is no
direct call on arm64 yet, the config can't be enabled.

To fix it, call ftrace_set_filter_ip() only if tr->fops is not NULL.

Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Bruno Goncalves <bgoncalv@redhat.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220728114048.3540461-1-xukuohai@huaweicloud.com

bpf: Fix test_progs -j error with fentry/fexit tests

When multiple threads are attaching/detaching fentry/fexit programs to
the same trampoline, we may call register_fentry on the same trampoline
twice: register_fentry(), unregister_fentry(), then register_fentry again.
This causes ftrace_set_filter_ip() for the same ip on tr->fops twice,
which leaves duplicated ip in tr->fops. The extra ip is not cleaned up
properly on unregister and thus causes failures with further register in
register_ftrace_direct_multi():

register_ftrace_direct_multi()
{
        ...
        for (i = 0; i < size; i++) {
                hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
                        if (ftrace_find_rec_direct(entry->ip))
                                goto out_unlock;
                }
        }
        ...
}

This can be triggered with parallel fentry/fexit tests with test_progs:

  ./test_progs -t fentry,fexit -j

Fix this by resetting tr->fops in ftrace_set_filter_ip(), so that there
will never be duplicated entries in tr->fops.

Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220729194106.1207472-1-song@kernel.org

selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout

The send_signal/send_signal_tracepoint is pretty flaky, with at least
one failure in every ten runs on a few attempts I've tried it:
  > test_send_signal_common:PASS:pipe_c2p 0 nsec
  > test_send_signal_common:PASS:pipe_p2c 0 nsec
  > test_send_signal_common:PASS:fork 0 nsec
  > test_send_signal_common:PASS:skel_open_and_load 0 nsec
  > test_send_signal_common:PASS:skel_attach 0 nsec
  > test_send_signal_common:PASS:pipe_read 0 nsec
  > test_send_signal_common:PASS:pipe_write 0 nsec
  > test_send_signal_common:PASS:reading pipe 0 nsec
  > test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
  > test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50
  > test_send_signal_common:PASS:pipe_write 0 nsec
  > #139/1   send_signal/send_signal_tracepoint:FAIL

The reason does not appear to be a correctness issue in the strict
sense. Rather, we merely do not receive the signal we are waiting for
within the provided timeout.
Let's bump the timeout by a factor of ten. With that change I have not
been able to reproduce the failure in 150+ iterations. I am also sneaking
in a small simplification to the test_progs test selection logic.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220727182955.4044988-1-deso@posteo.net

bpftool: Don't try to return value from void function in skeleton

A skeleton generated by bpftool previously contained a return followed
by an expression in OBJ_NAME__detach(), which has return type void. This
did not hurt, the bpf_object__detach_skeleton() called there returns
void itself anyway, but led to a warning when compiling with e.g.
-pedantic.

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220726133203.514087-1-jthinz@mailbox.tu-berlin.de

bpftool: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE macro

Use the ARRAY_SIZE macro and make the code more compact.

Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220726093045.3374026-1-clementwei90@163.com

bpf: btf: Fix vsnprintf return value check

vsnprintf returns the number of characters which would have been written if
enough space had been available, excluding the terminating null byte. Thus,
the return value of 'len_left' means that the last character has been
dropped.

Signed-off-by: Fedor Tokarev <ftokarev@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220711211317.GA1143610@laptop

iavf: Fix 'tc qdisc show' listing too many queues

Fix tc qdisc show dev <ethX> root displaying too many fq_codel qdiscs.
tc_modify_qdisc, which is caller of ndo_setup_tc, expects driver to call
netif_set_real_num_tx_queues, which prepares qdiscs.
Without this patch, fq_codel qdiscs would not be adjusted to number of
queues on VF.
e.g.:
tc qdisc show dev <ethX>
qdisc mq 0: root
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
tc qdisc add dev <ethX> root mqprio num_tc 2 map 1 0 0 0 0 0 0 0 queues 1@0 1@1 hw 1 mode channel shaper bw_rlimit max_rate 5000Mbit 150Mbit
tc qdisc show dev <ethX>
qdisc mqprio 8003: root tc 2 map 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             queues:(0:0) (1:1)
             mode:channel
             shaper:bw_rlimit   max_rate:5Gbit 150Mbit
qdisc fq_codel 0: parent 8003:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent 8003:3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent 8003:2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent 8003:1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64

While after fix:
tc qdisc add dev <ethX> root mqprio num_tc 2 map 1 0 0 0 0 0 0 0 queues 1@0 1@1 hw 1 mode channel shaper bw_rlimit max_rate 5000Mbit 150Mbit
tc qdisc show dev <ethX> #should show 2, shows 4
qdisc mqprio 8004: root tc 2 map 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             queues:(0:0) (1:1)
             mode:channel
             shaper:bw_rlimit   max_rate:5Gbit 150Mbit
qdisc fq_codel 0: parent 8004:2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent 8004:1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64

Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Co-developed-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Co-developed-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Fix max_rate limiting

Fix max_rate option in TC, check for proper quanta boundaries.
Check for minimum value provided and if it fits expected 50Mbps
quanta.

Without this patch, iavf could send settings for max_rate limiting
that would be accepted from by PF even the max_rate option is less
than expected 50Mbps quanta. It results in no rate limiting
on traffic as rate limiting will be floored to 0.

Example:
tc qdisc add dev $vf root mqprio num_tc 3 map 0 2 1 queues \
2@0 2@2 2@4 hw 1 mode channel shaper bw_rlimit \
max_rate 50Mbps 500Mbps 500Mbps

Should limit TC0 to circa 50 Mbps

tc qdisc add dev $vf root mqprio num_tc 3 map 0 2 1 queues \
2@0 2@2 2@4 hw 1 mode channel shaper bw_rlimit \
max_rate 0Mbps 100Kbit 500Mbps

Should return error

Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git

ath.git patches for v5.20. Major changes:

ath11k:

* fix WCN9074 to work again

* revert rfkill support as it was causing problems

wifi: rtw88: check the return value of alloc_workqueue()

The function alloc_workqueue() in rtw_core_init() can fail, but
there is no check of its return value. To fix this bug, its return value
should be checked with new error handling code.

Fixes: fe101716c7c9d ("rtw88: replace tx tasklet with work queue")
Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220723063756.2956189-1-williamsukatube@163.com

wifi: rtw89: 8852a: adjust IMR for SER L1

SER (system error recovery) L1 (level 1) has a step-by-step handshake
process with FW. These handshakes still rely on B_AX_HS0ISR_IND_INT_EN.
So, even already during recovery, we enable this bit in IMR.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220721074952.19676-1-pkshih@realtek.com

wifi: rtw89: 8852a: update RF radio A/B R56

Update to internal tag HALRF_027_00_060.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220622091937.11325-1-pkshih@realtek.com

wifi: wcn36xx: Add debugfs entry to read firmware feature strings

Add in the ability to easily find the firmware feature bits reported in the
get feature exchange without having to compile-in debug prints.

root@linaro-alip:~# cat /sys/kernel/debug/ieee80211/phy0/wcn36xx/firmware_feat_caps
MCC
P2P
DOT11AC
SLM_SESSIONIZATION
DOT11AC_OPMODE
SAP32STA
TDLS
P2P_GO_NOA_DECOUPLE_INIT_SCAN
WLANACTIVE_OFFLOAD
BEACON_OFFLOAD
SCAN_OFFLOAD
BCN_MISS_OFFLOAD
STA_POWERSAVE
STA_ADVANCED_PWRSAVE
BCN_FILTER
RTT
RATECTRL
WOW
WLAN_ROAM_SCAN_OFFLOAD
SPECULATIVE_PS_POLL
IBSS_HEARTBEAT_OFFLOAD
WLAN_SCAN_OFFLOAD
WLAN_PERIODIC_TX_PTRN
ADVANCE_TDLS
BATCH_SCAN
FW_IN_TX_PATH
EXTENDED_NSOFFLOAD_SLOT
CH_SWITCH_V1
HT40_OBSS_SCAN
UPDATE_CHANNEL_LIST
WLAN_MCADDR_FLT
WLAN_CH144
TDLS_SCAN_COEXISTENCE
LINK_LAYER_STATS_MEAS
MU_MIMO
EXTENDED_SCAN
DYNAMIC_WMM_PS
MAC_SPOOFED_SCAN
FW_STATS
WPS_PRBRSP_TMPL
BCN_IE_FLT_DELTA

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220727161655.2286867-5-bryan.odonoghue@linaro.org

wifi: wcn36xx: Move capability bitmap to string translation function to firmware.c

Move wcn36xx_get_cap_name() function in main.c into firmware.c as
wcn36xx_firmware_get_cap_name().

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220727161655.2286867-4-bryan.odonoghue@linaro.org

wifi: wcn36xx: Move firmware feature bit storage to dedicated firmware.c file

The naming of the get/set/clear firmware feature capability bits doesn't
really follow the established namespace pattern of
wcn36xx_logicalblock_do_something();

The feature bits are accessed by smd.c and main.c. It would be nice to
display the found feature bits in debugfs. To do so though we should tidy
up the namespace a bit.

Move the firmware feature exchange API to its own file - firmware.c giving
us the opportunity to functionally decompose other firmware related
accessors as appropriate in future.

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220727161655.2286867-3-bryan.odonoghue@linaro.org

wifi: wcn36xx: Rename clunky firmware feature bit enum

The enum name "place_holder_in_cap_bitmap" is self descriptively asking to
be changed to something else.

Rename place_holder_in_cap_bitmap to wcn36xx_firmware_feat_caps so that the
contents and intent of the enum is obvious.

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220727161655.2286867-2-bryan.odonoghue@linaro.org

Merge branch 'netdevsim-fib-route-delete-leak'

Ido Schimmel says:

====================
netdevsim: fib: Fix reference count leak on route deletion failure

Fix a recently reported netdevsim bug found using syzkaller.

Patch #1 fixes the bug.

Patch #2 adds a debugfs knob to allow us to test the fix.

Patch #3 adds test cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: netdevsim: Add test cases for route deletion failure

Add IPv4 and IPv6 test cases that ensure that we are not leaking a
reference on the nexthop device when we are unable to delete its
associated route.

Without the fix in a previous patch ("netdevsim: fib: Fix reference
count leak on route deletion failure") both test cases get stuck,
waiting for the reference to be released from the dummy device [1][2].

[1]
unregister_netdevice: waiting for dummy1 to become free. Usage count = 5
leaked reference.
fib_check_nh+0x275/0x620
fib_create_info+0x237c/0x4d30
fib_table_insert+0x1dd/0x1d20
inet_rtm_newroute+0x11b/0x200
rtnetlink_rcv_msg+0x43b/0xd20
netlink_rcv_skb+0x15e/0x430
netlink_unicast+0x53b/0x800
netlink_sendmsg+0x945/0xe40
____sys_sendmsg+0x747/0x960
___sys_sendmsg+0x11d/0x190
__sys_sendmsg+0x118/0x1e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

[2]
unregister_netdevice: waiting for dummy1 to become free. Usage count = 5
leaked reference.
fib6_nh_init+0xc46/0x1ca0
ip6_route_info_create+0x1167/0x19a0
ip6_route_add+0x27/0x150
inet6_rtm_newroute+0x161/0x170
rtnetlink_rcv_msg+0x43b/0xd20
netlink_rcv_skb+0x15e/0x430
netlink_unicast+0x53b/0x800
netlink_sendmsg+0x945/0xe40
____sys_sendmsg+0x747/0x960
___sys_sendmsg+0x11d/0x190
__sys_sendmsg+0x118/0x1e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: fib: Add debugfs knob to simulate route deletion failure

The previous patch ("netdevsim: fib: Fix reference count leak on route
deletion failure") fixed a reference count leak that happens on route
deletion failure.

Such failures can only be simulated by injecting slab allocation
failures, which cannot be surgically injected.

In order to be able to specifically test this scenario, add a debugfs
knob that allows user space to fail route deletion requests when
enabled.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: fib: Fix reference count leak on route deletion failure

As part of FIB offload simulation, netdevsim stores IPv4 and IPv6 routes
and holds a reference on FIB info structures that in turn hold a
reference on the associated nexthop device(s).

In the unlikely case where we are unable to allocate memory to process a
route deletion request, netdevsim will not release the reference from
the associated FIB info structure, thereby preventing the associated
nexthop device(s) from ever being removed [1].

Fix this by scheduling a work item that will flush netdevsim's FIB table
upon route deletion failure. This will cause netdevsim to release its
reference from all the FIB info structures in its table.

Reported by Lucas Leong of Trend Micro Zero Day Initiative.

Fixes: 0ae3eb7b4611 ("netdevsim: fib: Perform the route programming in a non-atomic context")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mtk_eth_soc-xdp-multi-frame'

Lorenzo Bianconi says:

====================
mtk_eth_soc: introduce xdp multi-frag support

Convert mtk_eth_soc driver to xdp_return_frame_bulk APIs.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: add xdp tx return bulking support

Convert mtk_eth_soc driver to xdp_return_frame_bulk APIs.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: introduce xdp multi-frag support

Add the capability to map non-linear xdp frames in XDP_TX and
ndo_xdp_xmit callback.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: introduce mtk_xdp_frame_map utility routine

This is a preliminary patch to add xdp multi-frag support to mtk_eth_soc
driver

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'seg6-headend-reduced'

Andrea Mayer says:

====================
seg6: add support for SRv6 Headend Reduced

This patchset adds support for SRv6 Headend behavior with Reduced
Encapsulation. It introduces the H.Encaps.Red and H.L2Encaps.Red versions
of the SRv6 H.Encaps and H.L2Encaps behaviors, according to RFC 8986 [1].

In details, the patchset is made of:
- patch 1/4: add support for SRv6 H.Encaps.Red behavior;
- Patch 2/4: add support for SRv6 H.L2Encaps.Red behavior;
- patch 2/4: add selftest for SRv6 H.Encaps.Red behavior;
- patch 3/4: add selftest for SRv6 H.L2Encaps.Red behavior.

The corresponding iproute2 patch for supporting SRv6 H.Encaps.Red and
H.L2Encaps.Red behaviors is provided in a separated patchset.

[1] - https://datatracker.ietf.org/doc/html/rfc8986

V4 -> v5:
- Fix skb checksum for SRH Reduced encapsulation/insertion;

- Improve selftests by:
      i) adding a random suffix to network namespaces;
     ii) creating net devices directly into network namespaces;
    iii) using trap EXIT command to properly clean up selftest networks.

Thanks to Paolo Abeni.

v3 -> v4:
- Add selftests to the Makefile, thanks to Jakub Kicinski.

v2 -> v3:
- Keep SRH when HMAC TLV is present;

- Split the support for H.Encaps.Red and H.L2Encaps.Red behaviors in two
   patches (respectively, patch 1/4 and patch 2/4);

- Add selftests for SRv6 H.Encaps.Red and H.L2Encaps.Red.

v1 -> v2:
- Fixed sparse warnings;

- memset now uses sizeof() instead of hardcoded value;

- Removed EXPORT_SYMBOL_GPL.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: seg6: add selftest for SRv6 H.L2Encaps.Red behavior

This selftest is designed for testing the H.L2Encaps.Red behavior. It
instantiates a virtual network composed of several nodes: hosts and SRv6
routers. Each node is realized using a network namespace that is
properly interconnected to others through veth pairs.
The test considers SRv6 routers implementing a L2 VPN leveraged by hosts
for communicating with each other. Such routers make use of the SRv6
H.L2Encaps.Red behavior for applying SRv6 policies to L2 traffic coming
from hosts.

The correct execution of the behavior is verified through reachability
tests carried out between hosts belonging to the same VPN.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: seg6: add selftest for SRv6 H.Encaps.Red behavior

This selftest is designed for testing the H.Encaps.Red behavior. It
instantiates a virtual network composed of several nodes: hosts and SRv6
routers. Each node is realized using a network namespace that is
properly interconnected to others through veth pairs.
The test considers SRv6 routers implementing L3 VPNs leveraged by hosts
for communicating with each other. Such routers make use of the SRv6
H.Encaps.Red behavior for applying SRv6 policies to L3 traffic coming
from hosts.

The correct execution of the behavior is verified through reachability
tests carried out between hosts belonging to the same VPN.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

seg6: add support for SRv6 H.L2Encaps.Red behavior

The SRv6 H.L2Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.L2Encaps behavior [2].

H.L2Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.

[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.4
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.3

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

seg6: add support for SRv6 H.Encaps.Red behavior

The SRv6 H.Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.Encaps behavior [2].

H.Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.

[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.2
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.1

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vmxnet3: do not reschedule napi for rx processing

Commit '2c5a5748105a ("vmxnet3: add support for out of order rx
completion")' added support for out of order rx completion. Within
that patch, an enhancement was done to reschedule napi for processing
rx completions.

However, it can lead to missing an interrupt. So, this patch reverts
that part of the code.

Fixes: 2c5a5748105a ("vmxnet3: add support for out of order rx completion")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: Describe net.ipv4.tcp_reflect_tos.

The tcp_reflect_tos option was introduced in Linux 5.10 but was still
undocumented.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/af_packet: check len when min_header_len equals to 0

User can use AF_PACKET socket to send packets with the length of 0.
When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
to send packets, and sock->type can be any type.

Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com
Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow unbound socket for packets in VRF when tcp_l3mdev_accept set

The commit 3c82a21f4320 ("net: allow binding socket in a VRF when
there's an unbound socket") changed the inet socket lookup to avoid
packets in a VRF from matching an unbound socket. This is to ensure the
necessary isolation between the default and other VRFs for routing and
forwarding. VRF-unaware processes running in the default VRF cannot
access another VRF and have to be run with 'ip vrf exec <vrf>'. This is
to be expected with tcp_l3mdev_accept disabled, but could be reallowed
when this sysctl option is enabled. So instead of directly checking dif
and sdif in inet[6]_match, here call inet_sk_bound_dev_eq(). This
allows a match on unbound socket for non-zero sdif i.e. for packets in
a VRF, if tcp_l3mdev_accept is enabled.

Fixes: 3c82a21f4320 ("net: allow binding socket in a VRF when there's an unbound socket")
Signed-off-by: Mike Manning <mvrmanning@gmail.com>
Link: https://lore.kernel.org/netdev/a54c149aed38fded2d3b5fdb1a6c89e36a083b74.camel@lasnet.de/
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-ptp-spectrum-2'

Ido Schimmel says:

====================
mlxsw: Add PTP support for Spectrum-2 and newer ASICs

This patchset adds PTP support for Spectrum-{2,3,4} switch ASICs. They
all act largely the same with respect to PTP except for a workaround
implemented for Spectrum-{2,3} in patch #6.

Spectrum-2 and newer ASICs essentially implement a transparent clock
between all the switch ports, including the CPU port. The hardware will
generate the UTC time stamp for transmitted / received packets at the
CPU port, but will compensate for forwarding delays in the ASIC by
adjusting the correction field in the PTP header (for PTP events) at the
ingress and egress ports.

Specifically, the hardware will subtract the current time stamp from the
correction field at the ingress port and will add the current time stamp
to the correction field at the egress port. For the purpose of an
ordinary or boundary clock (this patchset), the correction field will
always be adjusted between the CPU port and one of the front panel
ports, but never between two front panel ports.

Patchset overview:

Patch #1 extracts a helper to configure traps for PTP packets (event and
general messages). The helper is shared between all Spectrum
generations.

Patch #2 transitions Spectrum-2 and newer ASICs to use a different
format of Tx completions that includes the UTC time stamp of transmitted
packets.

Patch #3 adds basic initialization required for Spectrum-2 PTP support.
It mainly invokes the helper from patch #1.

Patch #4 adds helpers to read the UTC time (seconds and nanoseconds)
from the device over memory-mapped I/O instead of going through firmware
which is slower and therefore inaccurate. The helpers will be used to
implement various PHC operations (e.g., gettimex64) and to construct the
full UTC time stamp from the truncated one reported over Tx / Rx
completions.

Patch #5 implements the various PHC operations.

Patch #6 implements the previously described workaround for
Spectrum-{2,3}.

Patch #7 adds the ability to report a hardware time stamp for a received
/ transmitted packet based off the associated Rx / Tx completion that
includes a truncated UTC time stamp.

Patches #8 and #9 implement support for the SIOCGHWTSTAMP /
SIOCSHWTSTAMP ioctls and the get_ts_info ethtool callback, respectively.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Support ethtool 'get_ts_info' callback in Spectrum-2

The 'get_ts_info' callback is used for obtaining information about
time stamping and PTP hardware clock capabilities of a network device.

The existing function of Spectrum-1 is used to advertise the PHC
capabilities and the supported RX and TX filters. Implement a similar
function for Spectrum-2, expose that the supported 'rx_filters' are all
PTP event packets, as for these packets the driver fills the time stamp
from the CQE in the SKB.

In the future, mlxsw driver will be extended to support one-step PTP in
Spectrum-2 and newer ASICs. Then additional 'tx_types' will be supported.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls

The SIOCSHWTSTAMP ioctl configures HW timestamping on a given port. In
Spectrum-2 and above, each packet gets time stamp by default, but in
order to provide an accurate time stamp, software should configure to
update the correction field. In addition, the PTP traps are not enabled
by default, software should enable it per port or for all ports.

The switch behaves like a transparent clock between CPU port and each
front panel port. If ingress correction is set on a port for a given packet
type, then when such a packet is received via the port, the current time
stamp is subtracted from the correction field. If egress correction is set
on a port for a given packet type, then when such a packet is transmitted
via the port, the current time stamp is added to the correction field.

The result is that as the packet ingresses through a port with ingress
correction enabled, and egresses through a port with egress correction
enabled, the PTP correction field is updated to reflect the time that the
packet spent in the ASIC.

This can be used to update the correction field of trapped packets by
enabling ingress correction on a port where time stamping was enabled,
and egress correction on the CPU port. Similarly, for packets transmitted
from the host, ingress correction should be enabled on the CPU port, and
egress correction on a front-panel port.

However, since the correction fields will be updated for all PTP packets
crossing the CPU port, in order not to mangle the correction field, the
front panel port involved in the packet transfer must have the
corresponding correction enabled as well.

Therefore, when HW timestamping is enabled on at least one port, we have
to configure hardware to update the correction field and trap PTP event
packets on all ports.

Add reference count as part of 'struct mlxsw_sp_ptp_state', to maintain
how many ports use HW timestamping. Handle the correction field
configuration only when the first port enables time stamping and when the
last port disables time stamping. Store the configuration as part of
'struct mlxsw_sp_ptp_state', as it is global for all ports.

The SIOCGHWTSTAMP ioctl is a getter for the current configuration,
implement it and use the global configuration.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>