linux.git
17 months agoqed: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:45 +0000 (22:26 +0200)]
qed: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agonet/mlx5: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:44 +0000 (22:26 +0200)]
net/mlx5: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agomlxsw: core: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:43 +0000 (22:26 +0200)]
mlxsw: core: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agoocteontx2-af: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:42 +0000 (22:26 +0200)]
octeontx2-af: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agohinic: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:41 +0000 (22:26 +0200)]
hinic: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agobnxt_en: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:40 +0000 (22:26 +0200)]
bnxt_en: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agopds_core: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:39 +0000 (22:26 +0200)]
pds_core: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agonetdevsim: devlink health: use retained error fmsg API
Przemek Kitszel [Wed, 18 Oct 2023 20:26:38 +0000 (22:26 +0200)]
netdevsim: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 months agodevlink: retain error in struct devlink_fmsg
Przemek Kitszel [Wed, 18 Oct 2023 20:26:37 +0000 (22:26 +0200)]
devlink: retain error in struct devlink_fmsg

Retain error value in struct devlink_fmsg, to relieve drivers from
checking it after each call.
Note that fmsg is an in-memory builder/buffer of formatted message,
so it's not the case that half baked message was sent somewhere.

We could find following scheme in multiple drivers:
  err = devlink_fmsg_obj_nest_start(fmsg);
  if (err)
   return err;
  err = devlink_fmsg_string_pair_put(fmsg, "src", src);
  if (err)
   return err;
  err = devlink_fmsg_something(fmsg, foo, bar);
  if (err)
return err;
  // and so on...
  err = devlink_fmsg_obj_nest_end(fmsg);

With retaining error API that translates to:
  devlink_fmsg_obj_nest_start(fmsg);
  devlink_fmsg_string_pair_put(fmsg, "src", src);
  devlink_fmsg_something(fmsg, foo, bar);
  // and so on...
  devlink_fmsg_obj_nest_end(fmsg);

What means we check error just when is time to send.

Possible error scenarios are developer error (API misuse) and memory
exhaustion, both cases are good candidates to choose readability
over fastest possible exit.

Note that this patch keeps returning errors, to allow per-driver conversion
to the new API, but those are not needed at this point already.

This commit itself is an illustration of benefits for the dev-user,
more of it will be in separate commits of the series.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge branch 'tools-ynl-gen-support-full-range-of-min-max-checks'
Jakub Kicinski [Thu, 19 Oct 2023 22:54:57 +0000 (15:54 -0700)]
Merge branch 'tools-ynl-gen-support-full-range-of-min-max-checks'

Jakub Kicinski says:

====================
tools: ynl-gen: support full range of min/max checks

YNL code gen currently supports only very simple range checks
within the range of s16. Add support for full range of u64 / s64
which is good to have, and will be even more important with uint / sint.
====================

Link: https://lore.kernel.org/r/20231018163917.2514503-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotools: ynl-gen: support limit names
Jakub Kicinski [Wed, 18 Oct 2023 16:39:17 +0000 (09:39 -0700)]
tools: ynl-gen: support limit names

Support the use of symbolic names like s8-min or u32-max in checks
to make writing specs less painful.

Link: https://lore.kernel.org/r/20231018163917.2514503-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotools: ynl-gen: support full range of min/max checks for integer values
Jakub Kicinski [Wed, 18 Oct 2023 16:39:16 +0000 (09:39 -0700)]
tools: ynl-gen: support full range of min/max checks for integer values

Extend the support to full range of min/max checks.
None of the existing YNL families required complex integer validation.

The support is less than trivial, because we try to keep struct nla_policy
tiny the min/max members it holds in place are s16. Meaning we can only
express checks in range of s16. For larger ranges we need to define
a structure and link it in the policy.

Link: https://lore.kernel.org/r/20231018163917.2514503-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotools: ynl-gen: track attribute use
Jakub Kicinski [Wed, 18 Oct 2023 16:39:15 +0000 (09:39 -0700)]
tools: ynl-gen: track attribute use

For range validation we'll need to know if any individual
attribute is used on input (i.e. whether we will generate
a policy for it). Track this information.

Link: https://lore.kernel.org/r/20231018163917.2514503-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoptp: prevent string overflow
Dan Carpenter [Wed, 18 Oct 2023 14:20:11 +0000 (17:20 +0300)]
ptp: prevent string overflow

The ida_alloc_max() function can return up to INT_MAX so this buffer is
not large enough.  Also use snprintf() for extra safety.

Fixes: 403376ddb422 ("ptp: add debugfs interface to see applied channel masks")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/d4b1a995-a0cb-4125-aa1d-5fd5044aba1d@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 19 Oct 2023 20:13:03 +0000 (13:13 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2fb ("wifi: mac80211: fix error path key leak")
  2a8b665e6bcc ("wifi: mac80211: remove key_mtx")
  7d6904bf26b9 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 19 Oct 2023 19:08:18 +0000 (12:08 -0700)]
Merge tag 'net-6.6-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, netfilter, WiFi.

  Feels like an up-tick in regression fixes, mostly for older releases.
  The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
  fairly clear-cut user reported regressions. The mlx5 DMA bug was
  causing strife for 390x folks. The fixes themselves are not
  particularly scary, tho. No open investigations / outstanding reports
  at the time of writing.

  Current release - regressions:

   - eth: mlx5: perform DMA operations in the right locations, make
     devices usable on s390x, again

   - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
     curve, previous fix of rejecting invalid config broke some scripts

   - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock

   - revert "ethtool: Fix mod state of verbose no_mask bitset", needs
     more work

  Current release - new code bugs:

   - tcp: fix listen() warning with v4-mapped-v6 address

  Previous releases - regressions:

   - tcp: allow tcp_disconnect() again when threads are waiting, it was
     denied to plug a constant source of bugs but turns out .NET depends
     on it

   - eth: mlx5: fix double-free if buffer refill fails under OOM

   - revert "net: wwan: iosm: enable runtime pm support for 7560", it's
     causing regressions and the WWAN team at Intel disappeared

   - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
     single skb, fix single-stream perf regression on some devices

  Previous releases - always broken:

   - Bluetooth:
      - fix issues in legacy BR/EDR PIN code pairing
      - correctly bounds check and pad HCI_MON_NEW_INDEX name

   - netfilter:
      - more fixes / follow ups for the large "commit protocol" rework,
        which went in as a fix to 6.5
      - fix null-derefs on netlink attrs which user may not pass in

   - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
     Debian for keeping HZ=250 alive)

   - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
     letting frankenstein UDP super-frames from getting into the stack

   - net: fix interface altnames when ifc moves to a new namespace

   - eth: qed: fix the size of the RX buffers

   - mptcp: avoid sending RST when closing the initial subflow"

* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  Revert "ethtool: Fix mod state of verbose no_mask bitset"
  selftests: mptcp: join: no RST when rm subflow/addr
  mptcp: avoid sending RST when closing the initial subflow
  mptcp: more conservative check for zero probes
  tcp: check mptcp-level constraints for backlog coalescing
  selftests: mptcp: join: correctly check for no RST
  net: ti: icssg-prueth: Fix r30 CMDs bitmasks
  selftests: net: add very basic test for netdev names and namespaces
  net: move altnames together with the netdevice
  net: avoid UAF on deleted altname
  net: check for altname conflicts when changing netdev's netns
  net: fix ifname in netlink ntf during netns move
  net: ethernet: ti: Fix mixed module-builtin object
  net: phy: bcm7xxx: Add missing 16nm EPHY statistics
  ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
  tcp_bpf: properly release resources on error paths
  net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
  net: mdio-mux: fix C45 access returning -EIO after API change
  tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
  octeon_ep: update BQL sent bytes before ringing doorbell
  ...

18 months agoMerge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 19 Oct 2023 18:02:28 +0000 (11:02 -0700)]
Merge tag 'loongarch-fixes-6.6-3' of git://git./linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai ChenL
 "Fix 4-level pagetable building, disable WUC for pgprot_writecombine()
  like ioremap_wc(), use correct annotation for exception handlers, and
  a trivial cleanup"

* tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
  LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()
  LoongArch: Export symbol invalid_pud_table for modules building
  LoongArch: Use SYM_CODE_* to annotate exception handlers

18 months agoMerge tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 19 Oct 2023 17:53:31 +0000 (10:53 -0700)]
Merge tag 'slab-fixes-for-6.6-rc6' of git://git./linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

 - stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64
   due to improperly resolved kmalloc alignment restrictions (Catalin
   Marinas)

* tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()

18 months agoMerge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Thu, 19 Oct 2023 17:10:14 +0000 (10:10 -0700)]
Merge tag 'seccomp-v6.6-rc7' of git://git./linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:

 - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)

* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  perf/benchmark: fix seccomp_unotify benchmark for 32-bit

18 months agoMerge tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Thu, 19 Oct 2023 16:37:41 +0000 (09:37 -0700)]
Merge tag 'v6.6-rc7.vfs.fixes' of git://git./linux/kernel/git/vfs/vfs

Pull vfs fix from Christian Brauner:
 "An openat() call from io_uring triggering an audit call can apparently
  cause the refcount of struct filename to be incremented from multiple
  threads concurrently during async execution, triggering a refcount
  underflow and hitting a BUG_ON(). That bug has been lurking around
  since at least v5.16 apparently.

  Switch to an atomic counter to fix that. The underflow check is
  downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
  remove that check altogether tbh"

* tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  audit,io_uring: io_uring openat triggers audit reference count underflow

18 months agoRevert "ethtool: Fix mod state of verbose no_mask bitset"
Kory Maincent [Thu, 19 Oct 2023 13:16:41 +0000 (15:16 +0200)]
Revert "ethtool: Fix mod state of verbose no_mask bitset"

This reverts commit 108a36d07c01edbc5942d27c92494d1c6e4d45a0.

It was reported that this fix breaks the possibility to remove existing WoL
flags. For example:
~$ ethtool lan2
...
        Supports Wake-on: pg
        Wake-on: d
...
~$ ethtool -s lan2 wol gp
~$ ethtool lan2
...
        Wake-on: pg
...
~$ ethtool -s lan2 wol d
~$ ethtool lan2
...
        Wake-on: pg
...

This worked correctly before this commit because we were always updating
a zero bitmap (since commit 6699170376ab ("ethtool: fix application of
verbose no_mask bitset"), that is) so that the rest was left zero
naturally. But now the 1->0 change (old_val is true, bit not present in
netlink nest) no longer works.

Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/
Cc: stable@vger.kernel.org
Fixes: 108a36d07c01 ("ethtool: Fix mod state of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3
Linus Torvalds [Thu, 19 Oct 2023 16:10:18 +0000 (09:10 -0700)]
Merge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 fixes from Konstantin Komarov:

 - memory leak

 - some logic errors, NULL dereferences

 - some code was refactored

 - more sanity checks

* tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Avoid possible memory leak
  fs/ntfs3: Fix directory element type detection
  fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
  fs/ntfs3: Fix OOB read in ntfs_init_from_boot
  fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
  fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
  fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
  fs/ntfs3: Do not allow to change label if volume is read-only
  fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
  fs/ntfs3: Refactoring and comments
  fs/ntfs3: Fix alternative boot searching
  fs/ntfs3: Allow repeated call to ntfs3_put_sbi
  fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
  fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
  fs/ntfs3: fix deadlock in mark_as_free_ex
  fs/ntfs3: Add more attributes checks in mi_enum_attr()
  fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
  fs/ntfs3: Write immediately updated ntfs state
  fs/ntfs3: Add ckeck in ni_update_parent()

18 months agoMerge branch 'mptcp-fixes-for-v6-6'
Jakub Kicinski [Thu, 19 Oct 2023 16:10:02 +0000 (09:10 -0700)]
Merge branch 'mptcp-fixes-for-v6-6'

Mat Martineau says:

====================
mptcp: Fixes for v6.6

Patch 1 corrects the logic for MP_JOIN tests where 0 RSTs are expected.

Patch 2 ensures MPTCP packets are not incorrectly coalesced in the TCP
backlog queue.

Patch 3 avoids a zero-window probe and associated WARN_ON_ONCE() in an
expected MPTCP reinjection scenario.

Patches 4 & 5 allow an initial MPTCP subflow to be closed cleanly
instead of always sending RST. Associated selftest is updated.
====================

Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-0-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: mptcp: join: no RST when rm subflow/addr
Matthieu Baerts [Wed, 18 Oct 2023 18:23:56 +0000 (11:23 -0700)]
selftests: mptcp: join: no RST when rm subflow/addr

Recently, we noticed that some RST were wrongly generated when removing
the initial subflow.

This patch makes sure RST are not sent when removing any subflows or any
addresses.

Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agomptcp: avoid sending RST when closing the initial subflow
Geliang Tang [Wed, 18 Oct 2023 18:23:55 +0000 (11:23 -0700)]
mptcp: avoid sending RST when closing the initial subflow

When closing the first subflow, the MPTCP protocol unconditionally
calls tcp_disconnect(), which in turn generates a reset if the subflow
is established.

That is unexpected and different from what MPTCP does with MPJ
subflows, where resets are generated only on FASTCLOSE and other edge
scenarios.

We can't reuse for the first subflow the same code in place for MPJ
subflows, as MPTCP clean them up completely via a tcp_close() call,
while must keep the first subflow socket alive for later re-usage, due
to implementation constraints.

This patch adds a new helper __mptcp_subflow_disconnect() that
encapsulates, a logic similar to tcp_close, issuing a reset only when
the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
otherwise.

Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agomptcp: more conservative check for zero probes
Paolo Abeni [Wed, 18 Oct 2023 18:23:54 +0000 (11:23 -0700)]
mptcp: more conservative check for zero probes

Christoph reported that the MPTCP protocol can find the subflow-level
write queue unexpectedly not empty while crafting a zero-window probe,
hitting a warning:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 188 at net/mptcp/protocol.c:1312 mptcp_sendmsg_frag+0xc06/0xe70
Modules linked in:
CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 6.6.0-rc2-g1176aa719d7a #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:mptcp_sendmsg_frag+0xc06/0xe70 net/mptcp/protocol.c:1312
RAX: 47d0530de347ff6a RBX: 47d0530de347ff6b RCX: ffff8881015d3c00
RDX: ffff8881015d3c00 RSI: 47d0530de347ff6b RDI: 47d0530de347ff6b
RBP: 47d0530de347ff6b R08: ffffffff8243c6a8 R09: ffffffff82042d9c
R10: 0000000000000002 R11: ffffffff82056850 R12: ffff88812a13d580
R13: 0000000000000001 R14: ffff88812b375e50 R15: ffff88812bbf3200
FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000695118 CR3: 0000000115dfc001 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 __subflow_push_pending+0xa4/0x420 net/mptcp/protocol.c:1545
 __mptcp_push_pending+0x128/0x3b0 net/mptcp/protocol.c:1614
 mptcp_release_cb+0x218/0x5b0 net/mptcp/protocol.c:3391
 release_sock+0xf6/0x100 net/core/sock.c:3521
 mptcp_worker+0x6e8/0x8f0 net/mptcp/protocol.c:2746
 process_scheduled_works+0x341/0x690 kernel/workqueue.c:2630
 worker_thread+0x3a7/0x610 kernel/workqueue.c:2784
 kthread+0x143/0x180 kernel/kthread.c:388
 ret_from_fork+0x4d/0x60 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:304
 </TASK>

The root cause of the issue is that expectations are wrong: e.g. due
to MPTCP-level re-injection we can hit the critical condition.

Explicitly avoid the zero-window probe when the subflow write queue
is not empty and drop the related warnings.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/444
Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-3-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: check mptcp-level constraints for backlog coalescing
Paolo Abeni [Wed, 18 Oct 2023 18:23:53 +0000 (11:23 -0700)]
tcp: check mptcp-level constraints for backlog coalescing

The MPTCP protocol can acquire the subflow-level socket lock and
cause the tcp backlog usage. When inserting new skbs into the
backlog, the stack will try to coalesce them.

Currently, we have no check in place to ensure that such coalescing
will respect the MPTCP-level DSS, and that may cause data stream
corruption, as reported by Christoph.

Address the issue by adding the relevant admission check for coalescing
in tcp_add_backlog().

Note the issue is not easy to reproduce, as the MPTCP protocol tries
hard to avoid acquiring the subflow-level socket lock.

Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-2-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: mptcp: join: correctly check for no RST
Matthieu Baerts [Wed, 18 Oct 2023 18:23:52 +0000 (11:23 -0700)]
selftests: mptcp: join: correctly check for no RST

The commit mentioned below was more tolerant with the number of RST seen
during a test because in some uncontrollable situations, multiple RST
can be generated.

But it was not taking into account the case where no RST are expected:
this validation was then no longer reporting issues for the 0 RST case
because it is not possible to have less than 0 RST in the counter. This
patch fixes the issue by adding a specific condition.

Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: ti: icssg-prueth: Fix r30 CMDs bitmasks
MD Danish Anwar [Wed, 18 Oct 2023 15:07:15 +0000 (20:37 +0530)]
net: ti: icssg-prueth: Fix r30 CMDs bitmasks

The bitmasks for EMAC_PORT_DISABLE and EMAC_PORT_FORWARD r30 commands are
wrong in the driver.

Update the bitmasks of these commands to the correct ones as used by the
ICSSG firmware. These bitmasks are backwards compatible and work with
any ICSSG firmware version.

Fixes: e9b4ece7d74b ("net: ti: icssg-prueth: Add Firmware config and classification APIs.")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231018150715.3085380-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoi40e: Align devlink info versions with ice driver and add docs
Ivan Vecera [Wed, 18 Oct 2023 12:35:55 +0000 (14:35 +0200)]
i40e: Align devlink info versions with ice driver and add docs

Align devlink info versions with ice driver so change 'fw.mgmt'
version to be 2-digit version [major.minor], add 'fw.mgmt.build'
that reports mgmt firmware build number and use '"fw.psid.api'
for NVM format version instead of incorrect '"fw.psid'.
Additionally add missing i40e devlink documentation.

Fixes: 5a423552e0d9 ("i40e: Add handler for devlink .info_get")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231018123558.552453-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'dt-bindings-net-child-node-schema-cleanups'
Jakub Kicinski [Thu, 19 Oct 2023 15:56:39 +0000 (08:56 -0700)]
Merge branch 'dt-bindings-net-child-node-schema-cleanups'

Rob Herring says:

====================
dt-bindings: net: Child node schema cleanups

This is a series of clean-ups related to ensuring that child node
schemas are constrained to not allow undefined properties. Typically,
that means just adding additionalProperties or unevaluatedProperties as
appropriate. The DSA/switch schemas turned out to be a bit more
involved, so there's some more fixes and a bit of restructuring in them.
====================

Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-0-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: dsa: Drop 'ethernet-ports' node properties
Rob Herring [Mon, 16 Oct 2023 21:44:27 +0000 (16:44 -0500)]
dt-bindings: net: dsa: Drop 'ethernet-ports' node properties

Constraints on 'ethernet-ports' node properties are already defined by the
reference to ethernet-switch.yaml, so they can be dropped from the DSA
schema.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-8-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: mscc,vsc7514-switch: Simplify DSA and switch references
Rob Herring [Mon, 16 Oct 2023 21:44:26 +0000 (16:44 -0500)]
dt-bindings: net: mscc,vsc7514-switch: Simplify DSA and switch references

The mscc,vsc7514-switch schema doesn't add any custom port properties,
so it can just reference ethernet-switch.yaml#/$defs/base and
dsa.yaml#/$defs/ethernet-ports instead of the base file and can skip
defining port nodes.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-7-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: mscc,vsc7514-switch: Clean-up example indentation
Rob Herring [Mon, 16 Oct 2023 21:44:25 +0000 (16:44 -0500)]
dt-bindings: net: mscc,vsc7514-switch: Clean-up example indentation

The indentation for the example is completely messed up for
'ethernet-ports'. Fix it.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-6-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: ethernet-switch: Rename $defs "base" to 'ethernet-ports'
Rob Herring [Mon, 16 Oct 2023 21:44:24 +0000 (16:44 -0500)]
dt-bindings: net: ethernet-switch: Rename $defs "base" to 'ethernet-ports'

The name "base" is misleading as the definition is for a complete schema
definition without additional properties allowed, not a "base class".
Align the same to be the same as dsa.yaml. This schema file without any
json pointer path is the base schema which can be extended.

There are not yet any references to $defs/base to update.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-5-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: ethernet-switch: Add missing 'ethernet-ports' level
Rob Herring [Mon, 16 Oct 2023 21:44:23 +0000 (16:44 -0500)]
dt-bindings: net: ethernet-switch: Add missing 'ethernet-ports' level

The '$defs/ethernet-ports' schema is referenced by schemas defining a
child node 'ethernet-ports', but this schema misses the
'ethernet-ports' node. It would work if referring schemas made a
reference like this:

properties:
  ethernet-ports:
    $ref: ethernet-switch.yaml#/$defs/ethernet-ports

However, that would be different from how dsa.yaml works. For
consistency, align the schema definition with dsa.yaml and add the
missing level.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-4-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: dsa/switch: Make 'ethernet-port' node addresses hex
Rob Herring [Mon, 16 Oct 2023 21:44:22 +0000 (16:44 -0500)]
dt-bindings: net: dsa/switch: Make 'ethernet-port' node addresses hex

'ethernet-port' node unit-addresses should be in hexadecimal. Some
instances have it correct, but fix the ones that don't.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-3-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: renesas: Drop ethernet-phy node schema
Rob Herring [Mon, 16 Oct 2023 21:44:21 +0000 (16:44 -0500)]
dt-bindings: net: renesas: Drop ethernet-phy node schema

What's connected on the MDIO bus is outside the scope of the binding for
ethernet controller's MDIO bus unless it's a fixed internal device, so
drop the node name and reference to ethernet-phy.yaml.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-2-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: Add missing (unevaluated|additional)Properties on child node schemas
Rob Herring [Mon, 16 Oct 2023 21:44:20 +0000 (16:44 -0500)]
dt-bindings: net: Add missing (unevaluated|additional)Properties on child node schemas

Just as unevaluatedProperties or additionalProperties are required at
the top level of schemas, they should (and will) also be required for
child node schemas. That ensures only documented properties are
present for any node.

Add unevaluatedProperties or additionalProperties as appropriate.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-1-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 19 Oct 2023 15:56:01 +0000 (08:56 -0700)]
Merge tag 'for-6.6-rc6-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "Fix a bug in chunk size decision that could lead to suboptimal
  placement and filling patterns"

* tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix stripe length calculation for non-zoned data chunk allocation

18 months agoMerge branch 'net-fix-bugs-in-device-netns-move-and-rename'
Paolo Abeni [Thu, 19 Oct 2023 13:51:18 +0000 (15:51 +0200)]
Merge branch 'net-fix-bugs-in-device-netns-move-and-rename'

Jakub Kicinski says:

====================
net: fix bugs in device netns-move and rename

Daniel reported issues with the uevents generated during netdev
namespace move, if the netdev is getting renamed at the same time.

While the issue that he actually cares about is not fixed here,
there is a bunch of seemingly obvious other bugs in this code.
Fix the purely networking bugs while the discussion around
the uevent fix is still ongoing.
====================

Link: https://lore.kernel.org/r/20231018013817.2391509-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoselftests: net: add very basic test for netdev names and namespaces
Jakub Kicinski [Wed, 18 Oct 2023 01:38:17 +0000 (18:38 -0700)]
selftests: net: add very basic test for netdev names and namespaces

Add selftest for fixes around naming netdevs and namespaces.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: move altnames together with the netdevice
Jakub Kicinski [Wed, 18 Oct 2023 01:38:16 +0000 (18:38 -0700)]
net: move altnames together with the netdevice

The altname nodes are currently not moved to the new netns
when netdevice itself moves:

  [ ~]# ip netns add test
  [ ~]# ip -netns test link add name eth0 type dummy
  [ ~]# ip -netns test link property add dev eth0 altname some-name
  [ ~]# ip -netns test link show dev some-name
  2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff
      altname some-name
  [ ~]# ip -netns test link set dev eth0 netns 1
  [ ~]# ip link
  ...
  3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
      altname some-name
  [ ~]# ip li show dev some-name
  Device "some-name" does not exist.

Remove them from the hash table when device is unlisted
and add back when listed again.

Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: avoid UAF on deleted altname
Jakub Kicinski [Wed, 18 Oct 2023 01:38:15 +0000 (18:38 -0700)]
net: avoid UAF on deleted altname

Altnames are accessed under RCU (dev_get_by_name_rcu())
but freed by kfree() with no synchronization point.

Each node has one or two allocations (node and a variable-size
name, sometimes the name is netdev->name). Adding rcu_heads
here is a bit tedious. Besides most code which unlists the names
already has rcu barriers - so take the simpler approach of adding
synchronize_rcu(). Note that the one on the unregistration path
(which matters more) is removed by the next fix.

Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: check for altname conflicts when changing netdev's netns
Jakub Kicinski [Wed, 18 Oct 2023 01:38:14 +0000 (18:38 -0700)]
net: check for altname conflicts when changing netdev's netns

It's currently possible to create an altname conflicting
with an altname or real name of another device by creating
it in another netns and moving it over:

 [ ~]$ ip link add dev eth0 type dummy

 [ ~]$ ip netns add test
 [ ~]$ ip -netns test link add dev ethX netns test type dummy
 [ ~]$ ip -netns test link property add dev ethX altname eth0
 [ ~]$ ip -netns test link set dev ethX netns 1

 [ ~]$ ip link
 ...
 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
 ...
 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
     altname eth0

Create a macro for walking the altnames, this hopefully makes
it clearer that the list we walk contains only altnames.
Which is otherwise not entirely intuitive.

Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: fix ifname in netlink ntf during netns move
Jakub Kicinski [Wed, 18 Oct 2023 01:38:13 +0000 (18:38 -0700)]
net: fix ifname in netlink ntf during netns move

dev_get_valid_name() overwrites the netdev's name on success.
This makes it hard to use in prepare-commit-like fashion,
where we do validation first, and "commit" to the change
later.

Factor out a helper which lets us save the new name to a buffer.
Use it to fix the problem of notification on netns move having
incorrect name:

 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
     link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff

 [ ~]# ip link set dev eth0 netns 1 name eth1

ip monitor inside netns:
 Deleted inet eth0
 Deleted inet6 eth0
 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7

Name is reported as eth1 in old netns for ifindex 5, already renamed.

Fixes: d90310243fd7 ("net: device name allocation cleanups")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge branch 'net-stmmac-improve-tx-timer-logic'
Paolo Abeni [Thu, 19 Oct 2023 13:41:38 +0000 (15:41 +0200)]
Merge branch 'net-stmmac-improve-tx-timer-logic'

Christian Marangi says:

====================
net: stmmac: improve tx timer logic

This series comes with the intention of restoring original performance
of stmmac on some router/device that used the stmmac driver to handle
gigabit traffic.

More info are present in patch 3. This cover letter is to show results
and improvements of the following change.

The move to hr_timer for tx timer and commit 8fce33317023 ("net: stmmac:
Rework coalesce timer and fix multi-queue races") caused big performance
regression on these kind of device.

This was observed on ipq806x that after kernel 4.19 couldn't handle
gigabit speed anymore.

The following series is currently applied and tested in OpenWrt SNAPSHOT
and have great performance increase. (the scenario is qca8k switch +
stmmac dwmac1000) Some good comparison can be found here [1].

The difference is from a swconfig scenario (where dsa tagging is not
used so very low CPU impact in handling traffic) and DSA scenario where
tagging is used and there is a minimal impact in the CPU. As can be
notice even with DSA in place we have better perf.

It was observed by other user that also SQM scenario with cake scheduler
were improved in the order of 100mbps (this scenario is CPU limited and
any increase of perf is caused by removing load on the CPU)

Been at least 15 days that this is in use without any complain or bug
reported about queue timeout. (was the case with v1 before the
additional patch was added, only appear on real world tests and not on
iperf tests)

[1] https://forum.openwrt.org/t/netgear-r7800-exploration-ipq8065-qca9984/285/3427?u=ansuel
====================

Link: https://lore.kernel.org/r/20231018123550.27110-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: increase TX coalesce timer to 5ms
Christian Marangi [Wed, 18 Oct 2023 12:35:50 +0000 (14:35 +0200)]
net: stmmac: increase TX coalesce timer to 5ms

Commit 8fce33317023 ("net: stmmac: Rework coalesce timer and fix
multi-queue races") decreased the TX coalesce timer from 40ms to 1ms.

This caused some performance regression on some target (regression was
reported at least on ipq806x) in the order of 600mbps dropping from
gigabit handling to only 200mbps.

The problem was identified in the TX timer getting armed too much time.
While this was fixed and improved in another commit, performance can be
improved even further by increasing the timer delay a bit moving from
1ms to 5ms.

The value is a good balance between battery saving by prevending too
much interrupt to be generated and permitting good performance for
internet oriented devices.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: move TX timer arm after DMA enable
Christian Marangi [Wed, 18 Oct 2023 12:35:49 +0000 (14:35 +0200)]
net: stmmac: move TX timer arm after DMA enable

Move TX timer arm call after DMA interrupt is enabled again.

The TX timer arm function changed logic and now is skipped if a napi is
already scheduled. By moving the TX timer arm call after DMA is enabled,
we permit to correctly skip if a DMA interrupt has been fired and a napi
has been scheduled again.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: improve TX timer arm logic
Christian Marangi [Wed, 18 Oct 2023 12:35:48 +0000 (14:35 +0200)]
net: stmmac: improve TX timer arm logic

There is currently a problem with the TX timer getting armed multiple
unnecessary times causing big performance regression on some device that
suffer from heavy handling of hrtimer rearm.

The use of the TX timer is an old implementation that predates the napi
implementation and the interrupt enable/disable handling.

Due to stmmac being a very old code, the TX timer was never evaluated
again with this new implementation and was kept there causing
performance regression. The performance regression started to appear
with kernel version 4.19 with 8fce33317023 ("net: stmmac: Rework coalesce
timer and fix multi-queue races") where the timer was reduced to 1ms
causing it to be armed 40 times more than before.

Decreasing the timer made the problem more present and caused the
regression in the other of 600-700mbps on some device (regression where
this was notice is ipq806x).

The problem is in the fact that handling the hrtimer on some target is
expensive and recent kernel made the timer armed much more times.
A solution that was proposed was reverting the hrtimer change and use
mod_timer but such solution would still hide the real problem in the
current implementation.

To fix the regression, apply some additional logic and skip arming the
timer when not needed.

Arm the timer ONLY if a napi is not already scheduled. Running the timer
is redundant since the same function (stmmac_tx_clean) will run in the
napi TX poll. Also try to cancel any timer if a napi is scheduled to
prevent redundant run of TX call.

With the following new logic the original performance are restored while
keeping using the hrtimer.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: introduce napi_is_scheduled helper
Christian Marangi [Wed, 18 Oct 2023 12:35:47 +0000 (14:35 +0200)]
net: introduce napi_is_scheduled helper

We currently have napi_if_scheduled_mark_missed that can be used to
check if napi is scheduled but that does more thing than simply checking
it and return a bool. Some driver already implement custom function to
check if napi is scheduled.

Drop these custom function and introduce napi_is_scheduled that simply
check if napi is scheduled atomically.

Update any driver and code that implement a similar check and instead
use this new helper.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoiavf: delete unused iavf_mac_info fields
Michal Schmidt [Wed, 18 Oct 2023 11:15:27 +0000 (13:15 +0200)]
iavf: delete unused iavf_mac_info fields

'san_addr' and 'mac_fcoeq' members of struct iavf_mac_info are unused.
'type' is write-only. Delete all three.

The function iavf_set_mac_type that sets 'type' also checks if the PCI
vendor ID is Intel. This is unnecessary. Delete the whole function.

If in the future there's a need for the MAC type (or other PCI
ID-dependent data), I would prefer to use .driver_data in iavf_pci_tbl[]
for this purpose.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231018111527.78194-1-mschmidt@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoinet: lock the socket in ip_sock_set_tos()
Eric Dumazet [Wed, 18 Oct 2023 09:00:13 +0000 (09:00 +0000)]
inet: lock the socket in ip_sock_set_tos()

Christoph Paasch reported a panic in TCP stack [1]

Indeed, we should not call sk_dst_reset() without holding
the socket lock, as __sk_dst_get() callers do not all rely
on bare RCU.

[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 12bad6067 P4D 12bad6067 PUD 12bad5067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 2750 Comm: syz-executor.5 Not tainted 6.6.0-rc4-g7a5720a344e7 #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:tcp_get_metrics+0x118/0x8f0 net/ipv4/tcp_metrics.c:321
Code: c7 44 24 70 02 00 8b 03 89 44 24 48 c7 44 24 4c 00 00 00 00 66 c7 44 24 58 02 00 66 ba 02 00 b1 01 89 4c 24 04 4c 89 7c 24 10 <49> 8b 0f 48 8b 89 50 05 00 00 48 89 4c 24 30 33 81 00 02 00 00 69
RSP: 0018:ffffc90000af79b8 EFLAGS: 00010293
RAX: 000000000100007f RBX: ffff88812ae8f500 RCX: ffff88812b5f8f01
RDX: 0000000000000002 RSI: ffffffff8300f080 RDI: 0000000000000002
RBP: 0000000000000002 R08: 0000000000000003 R09: ffffffff8205eca0
R10: 0000000000000002 R11: ffff88812b5f8f00 R12: ffff88812a9e0580
R13: 0000000000000000 R14: ffff88812ae8fbd2 R15: 0000000000000000
FS: 00007f70a006b640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000012bad7003 CR4: 0000000000170ee0
Call Trace:
<TASK>
tcp_fastopen_cache_get+0x32/0x140 net/ipv4/tcp_metrics.c:567
tcp_fastopen_cookie_check+0x28/0x180 net/ipv4/tcp_fastopen.c:419
tcp_connect+0x9c8/0x12a0 net/ipv4/tcp_output.c:3839
tcp_v4_connect+0x645/0x6e0 net/ipv4/tcp_ipv4.c:323
__inet_stream_connect+0x120/0x590 net/ipv4/af_inet.c:676
tcp_sendmsg_fastopen+0x2d6/0x3a0 net/ipv4/tcp.c:1021
tcp_sendmsg_locked+0x1957/0x1b00 net/ipv4/tcp.c:1073
tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1336
__sock_sendmsg+0x83/0xd0 net/socket.c:730
__sys_sendto+0x20a/0x2a0 net/socket.c:2194
__do_sys_sendto net/socket.c:2206 [inline]

Fixes: e08d0b3d1723 ("inet: implement lockless IP_TOS")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231018090014.345158-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge branch 'net-stmmac-use-correct-pps-input-indexing'
Paolo Abeni [Thu, 19 Oct 2023 11:01:35 +0000 (13:01 +0200)]
Merge branch 'net-stmmac-use-correct-pps-input-indexing'

Johannes Zink says:

====================
net: stmmac: use correct PPS input indexing

The stmmac can have 0 to 4 auxiliary snapshot in channels, which can be
used for capturing external triggers with respect to the eqos PTP timer.

Previously when enabling the auxiliary snapshot, an invalid request was
written to the hardware register, except for the Intel variant of this
driver, where the only snapshot available was hardcoded.

Patch 1 of this series cleans up the debug netdev_dbg message indicating
the auxiliary snapshot being {en,dis}abled. No functional changes here

Patch 2 of this series writes the correct PPS input indexing to the
hardware registers instead of a previously used fixed value

Patch 3 of this series removes a field member from plat_stmmacnet_data
that is no longer needed

Patch 4 of this series prepares Patch 5 by protecting the snapshot
enabled flag by the aux_ts_lock mutex

Patch 5 of this series adds a temporary workaround, since at the moment
the driver can handle only one single auxiliary snapshot at a time.
Previously the driver silently dropped the previous configuration and
enabled the new one. Now, if a snapshot is already enabled and userspace
tries to enable another without previously disabling the snapshot currently
enabled: issue a netdev_err and return an errorcode indicating the device is
busy.

This series is a "never worked, doesn't hurt anyone" touchup to the PPS
capture for non-intel variants of the dwmac driver.
====================

Link: https://lore.kernel.org/r/20231010-stmmac_fix_auxiliary_event_capture-v2-0-51d5f56542d7@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: do not silently change auxiliary snapshot capture channel
Johannes Zink [Wed, 18 Oct 2023 07:09:57 +0000 (09:09 +0200)]
net: stmmac: do not silently change auxiliary snapshot capture channel

Even though the hardware theoretically supports up to 4 simultaneous
auxiliary snapshot capture channels, the stmmac driver does support only
a single channel to be active at a time.

Previously in case of a PTP_CLK_REQ_EXTTS request, previously active
auxiliary snapshot capture channels were silently dropped and the new
channel was activated.

Instead of silently changing the state for all consumers, log an error
and return -EBUSY if a channel is already in use in order to signal to
userspace to disable the currently active channel before enabling another one.

Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: ptp: stmmac_enable(): move change of plat->flags into mutex
Johannes Zink [Wed, 18 Oct 2023 07:09:56 +0000 (09:09 +0200)]
net: stmmac: ptp: stmmac_enable(): move change of plat->flags into mutex

This is a preparation patch. The next patch will check if an external TS
is active and return with an error. So we have to move the change of the
plat->flags that tracks if external timestamping is enabled after that
check.

Prepare for this change and move the plat->flags change into the mutex
and the if (on).

Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: intel: remove unnecessary field struct plat_stmmacenet_data::ext_snapsho...
Johannes Zink [Wed, 18 Oct 2023 07:09:55 +0000 (09:09 +0200)]
net: stmmac: intel: remove unnecessary field struct plat_stmmacenet_data::ext_snapshot_num

Do not store bitmask for enabling AUX_SNAPSHOT0. The previous commit
("net: stmmac: fix PPS capture input index") takes care of calculating
the proper bit mask from the request data's extts.index field, which is
0 if not explicitly specified otherwise.

Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: use correct PPS capture input index
Johannes Zink [Wed, 18 Oct 2023 07:09:54 +0000 (09:09 +0200)]
net: stmmac: use correct PPS capture input index

The stmmac supports up to 4 auxiliary snapshots that can be enabled by
setting the appropriate bits in the PTP_ACR bitfield.

Previously as of commit f4da56529da6 ("net: stmmac: Add support for
external trigger timestamping") instead of setting the bits, a fixed
value was written to this bitfield instead of passing the appropriate
bitmask.

Now the correct bit is set according to the ptp_clock_request.extts_index
passed as a parameter to stmmac_enable().

Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: simplify debug message on stmmac_enable()
Johannes Zink [Wed, 18 Oct 2023 07:09:53 +0000 (09:09 +0200)]
net: stmmac: simplify debug message on stmmac_enable()

Simplify the netdev_dbg() call in stmmac_enable() in order to reduce code
duplication. No functional change.

Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: ethernet: ti: Fix mixed module-builtin object
MD Danish Anwar [Wed, 18 Oct 2023 06:49:36 +0000 (12:19 +0530)]
net: ethernet: ti: Fix mixed module-builtin object

With CONFIG_TI_K3_AM65_CPSW_NUSS=y and CONFIG_TI_ICSSG_PRUETH=m,
k3-cppi-desc-pool.o is linked to a module and also to vmlinux even though
the expected CFLAGS are different between builtins and modules.

The build system is complaining about the following:

k3-cppi-desc-pool.o is added to multiple modules: icssg-prueth
ti-am65-cpsw-nuss

Introduce the new module, k3-cppi-desc-pool, to provide the common
functions to ti-am65-cpsw-nuss and icssg-prueth.

Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/20231018064936.3146846-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: stmmac: Remove redundant checking for rx_coalesce_usecs
Gan Yi Fang [Wed, 18 Oct 2023 03:08:02 +0000 (11:08 +0800)]
net: stmmac: Remove redundant checking for rx_coalesce_usecs

The datatype of rx_coalesce_usecs is u32, always larger or equal to zero.
Previous checking does not include value 0, this patch removes the
checking to handle the value 0. This change in behaviour making the
value of 0 cause an error is not a problem because 0 is out of
range of rx_coalesce_usecs.

Signed-off-by: Gan Yi Fang <yi.fang.gan@intel.com>
Link: https://lore.kernel.org/r/20231018030802.741923-1-yi.fang.gan@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agodocs: networking: document multi-RSS context
Jakub Kicinski [Wed, 18 Oct 2023 01:07:58 +0000 (18:07 -0700)]
docs: networking: document multi-RSS context

There seems to be no docs for the concept of multiple RSS
contexts and how to configure it. I had to explain it three
times recently, the last one being the charm, document it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20231018010758.2382742-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge branch 'rswitch-add-pm-ops'
Paolo Abeni [Thu, 19 Oct 2023 08:59:42 +0000 (10:59 +0200)]
Merge branch 'rswitch-add-pm-ops'

Yoshihiro Shimoda says:

====================
rswitch: Add PM ops

This patch is based on the latest net-next.git / next branch.
After applied this patch with the following patches, the system can
enter/exit Suspend to Idle without any error:
https://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git/commit/?h=next&id=aa4c0bbf820ddb9dd8105a403aa12df57b9e5129
https://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git/commit/?h=next&id=1a5361189b7acac15b9b086b2300a11b7aa84c06
====================

Link: https://lore.kernel.org/r/20231017113402.849735-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agorswitch: Add PM ops
Yoshihiro Shimoda [Tue, 17 Oct 2023 11:34:02 +0000 (20:34 +0900)]
rswitch: Add PM ops

Add PM ops for Suspend to Idle. When the system suspended,
the Ethernet Serdes's clock will be stopped. So, this driver needs
to re-initialize the Ethernet Serdes by phy_init() in
renesas_eth_sw_resume(). Otherwise, timeout happened in phy_power_on().

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agorswitch: Use unsigned int for port related array index
Yoshihiro Shimoda [Tue, 17 Oct 2023 11:34:01 +0000 (20:34 +0900)]
rswitch: Use unsigned int for port related array index

Array index should not be negative, so modify the condition of
rswitch_for_each_enabled_port_continue_reverse() macro, and then
use unsigned int instead.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 19 Oct 2023 01:17:50 +0000 (18:17 -0700)]
Merge tag 'nf-23-10-18' of https://git./linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

First patch, from Phil Sutter, reduces number of audit notifications
when userspace requests to re-set stateful objects.
This change also comes with a selftest update.

Second patch, also from Phil, moves the nftables audit selftest
to its own netns to avoid interference with the init netns.

Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree"
set backend: When set element X has expired, a request to delete element
X should fail (like with all other backends).

Finally, patch four, also from Pablo, reverts a recent attempt to speed
up abort of a large pending update with the "pipapo" set backend.

It could cause stray references to remain in the set, which then
results in a double-free.

* tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: revert do not remove elements if set backend implements .abort
  netfilter: nft_set_rbtree: .deactivate fails if element has expired
  selftests: netfilter: Run nft_audit.sh in its own netns
  netfilter: nf_tables: audit log object reset once per table
====================

Link: https://lore.kernel.org/r/20231018125605.27299-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 19 Oct 2023 01:14:25 +0000 (18:14 -0700)]
Merge tag 'wireless-2023-10-18' of git://git./linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
A few more fixes:
 * prevent value bounce/glitch in rfkill GPIO probe
 * fix lockdep report in rfkill
 * fix error path leak in mac80211 key handling
 * use system_unbound_wq for wiphy work since it
   can take longer

* tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  net: rfkill: reduce data->mtx scope in rfkill_fop_open
  net: rfkill: gpio: prevent value glitch during probe
  wifi: mac80211: fix error path key leak
  wifi: cfg80211: use system_unbound_wq for wiphy work
====================

Link: https://lore.kernel.org/r/20231018071041.8175-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: phy: bcm7xxx: Add missing 16nm EPHY statistics
Florian Fainelli [Tue, 17 Oct 2023 20:51:19 +0000 (13:51 -0700)]
net: phy: bcm7xxx: Add missing 16nm EPHY statistics

The .probe() function would allocate the necessary space and ensure that
the library call sizes the number of statistics but the callbacks
necessary to fetch the name and values were not wired up.

Reported-by: Justin Chen <justin.chen@broadcom.com>
Fixes: f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231017205119.416392-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
Eric Dumazet [Tue, 17 Oct 2023 19:23:04 +0000 (19:23 +0000)]
ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr

syzbot reported a data-race while accessing nh->nh_saddr_genid [1]

Add annotations, but leave the code lazy as intended.

[1]
BUG: KCSAN: data-race in fib_select_path / fib_select_path

write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1:
fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline]
fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline]
fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269
ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0:
fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline]
fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269
ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x959d3217 -> 0x959d3218

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker

Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231017192304.82626-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'intel-wired-lan-driver-updates-2023-10-17'
Jakub Kicinski [Thu, 19 Oct 2023 01:10:20 +0000 (18:10 -0700)]
Merge branch 'intel-wired-lan-driver-updates-2023-10-17'

Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2023-10-17

This series contains cleanups for all the Intel drivers relating to their
use of format specifiers and the use of strncpy.

Jesse fixes various -Wformat warnings across all the Intel networking,
including various cases where a "%s" string format specifier is preferred,
and using kasprintf instead of snprintf.

Justin replaces all of the uses of the now deprecated strncpy with a more
modern string function, primarily strscpy.
====================

Link: https://lore.kernel.org/r/20231017190411.2199743-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoigc: replace deprecated strncpy with strscpy
Justin Stitt [Tue, 17 Oct 2023 19:04:11 +0000 (12:04 -0700)]
igc: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We expect netdev->name to be NUL-terminated based on its use with format
strings:
|       if (q_vector->rx.ring && q_vector->tx.ring)
|               sprintf(q_vector->name, "%s-TxRx-%u", netdev->name,

Furthermore, we do not need NUL-padding as netdev is already
zero-allocated:
|       netdev = alloc_etherdev_mq(sizeof(struct igc_adapter),
|                                  IGC_MAX_TX_QUEUES);
...
alloc_etherdev() -> alloc_etherdev_mq() -> alloc_etherdev_mqs() ->
alloc_netdev_mqs() ...
|       p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-10-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoigbvf: replace deprecated strncpy with strscpy
Justin Stitt [Tue, 17 Oct 2023 19:04:10 +0000 (12:04 -0700)]
igbvf: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We expect netdev->name to be NUL-terminated based on its usage with
`strlen` and format strings:
|       if (strlen(netdev->name) < (IFNAMSIZ - 5)) {
|               sprintf(adapter->tx_ring->name, "%s-tx-0", netdev->name);

Moreover, we do not need NUL-padding as netdev is already
zero-allocated:
|       netdev = alloc_etherdev(sizeof(struct igbvf_adapter));
...
alloc_etherdev() -> alloc_etherdev_mq() -> alloc_etherdev_mqs() ->
alloc_netdev_mqs() ...
|       p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-9-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoigb: replace deprecated strncpy with strscpy
Justin Stitt [Tue, 17 Oct 2023 19:04:09 +0000 (12:04 -0700)]
igb: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We see that netdev->name is expected to be NUL-terminated based on its
usage with format strings:
|       sprintf(q_vector->name, "%s-TxRx-%u", netdev->name,
|               q_vector->rx.ring->queue_index);

Furthermore, NUL-padding is not required as netdev is already
zero-allocated:
|       netdev = alloc_etherdev_mq(sizeof(struct igb_adapter),
|                                  IGB_MAX_TX_QUEUES);
...
alloc_etherdev_mq() -> alloc_etherdev_mqs() -> alloc_netdev_mqs() ...
|       p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-8-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoi40e: use scnprintf over strncpy+strncat
Justin Stitt [Tue, 17 Oct 2023 19:04:08 +0000 (12:04 -0700)]
i40e: use scnprintf over strncpy+strncat

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

Moreover, `strncat` shouldn't really be used either as per
fortify-string.h:
 * Do not use this function. While FORTIFY_SOURCE tries to avoid
 * read and write overflows, this is only possible when the sizes
 * of @p and @q are known to the compiler. Prefer building the
 * string with formatting, via scnprintf() or similar.

Instead, use `scnprintf` with "%s%s" format string. This code is now
more readable and robust.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-7-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agofm10k: replace deprecated strncpy with strscpy
Justin Stitt [Tue, 17 Oct 2023 19:04:07 +0000 (12:04 -0700)]
fm10k: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on the destination buffer without
unnecessarily NUL-padding.

Other implementations of .*get_drvinfo also use strscpy so this patch
brings fm10k_get_drvinfo in line as well:

igb/igb_ethtool.c +851
static void igb_get_drvinfo(struct net_device *netdev,

igbvf/ethtool.c
167:static void igbvf_get_drvinfo(struct net_device *netdev,

i40e/i40e_ethtool.c
1999:static void i40e_get_drvinfo(struct net_device *netdev,

e1000/e1000_ethtool.c
529:static void e1000_get_drvinfo(struct net_device *netdev,

ixgbevf/ethtool.c
211:static void ixgbevf_get_drvinfo(struct net_device *netdev,

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-6-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoe1000: replace deprecated strncpy with strscpy
Justin Stitt [Tue, 17 Oct 2023 19:04:06 +0000 (12:04 -0700)]
e1000: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We can see that netdev->name is expected to be NUL-terminated based on
it's usage with format strings:
|       pr_info("%s NIC Link is Down\n",
|               netdev->name);

A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on the destination buffer without
unnecessarily NUL-padding.

This is in line with other uses of strscpy on netdev->name:
$ rg "strscpy\(netdev\->name.*pci.*"

drivers/net/ethernet/intel/e1000e/netdev.c
7455:   strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name));

drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
10839:  strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name));

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-5-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoe100: replace deprecated strncpy with strscpy
Justin Stitt [Tue, 17 Oct 2023 19:04:05 +0000 (12:04 -0700)]
e100: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

The "...-1" pattern makes it evident that netdev->name is expected to be
NUL-terminated.

Meanwhile, it seems NUL-padding is not required due to alloc_etherdev
zero-allocating the buffer.

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

This is in line with other uses of strscpy on netdev->name:
$ rg "strscpy\(netdev\->name.*pci.*"

drivers/net/ethernet/intel/e1000e/netdev.c
7455:   strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name));

drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
10839:  strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name));

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-4-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agointel: fix format warnings
Jesse Brandeburg [Tue, 17 Oct 2023 19:04:04 +0000 (12:04 -0700)]
intel: fix format warnings

Get ahead of the game and fix all the -Wformat=2 noted warnings in the
intel drivers directory.

There are one set of i40e and iavf warnings I couldn't figure out how to
fix because the driver is already using vsnprintf without an explicit
"const char *" format string.

Tested with both gcc-12 and clang-15. I found gcc-12 runs clean after
this series but clang-15 is a little worried about the vsnprintf lines.

summary of warnings:

drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c:148:34: warning: format string is not a string literal [-Wformat-nonliteral]
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/iavf/iavf_ethtool.c:199:34: warning: format string is not a string literal [-Wformat-nonliteral]
drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:208:34: warning: format string is not a string literal [-Wformat-nonliteral]
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: note: treat the string as an argument to avoid this

Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-3-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agointel: fix string truncation warnings
Jesse Brandeburg [Tue, 17 Oct 2023 19:04:03 +0000 (12:04 -0700)]
intel: fix string truncation warnings

Fix -Wformat-truncated warnings to complete the intel directories' W=1
clean efforts. The W=1 recently got enhanced with a few new flags and
this brought up some new warnings.

Switch to using kasprintf() when possible so we always allocate the
right length strings.

summary of warnings:
drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1425:60: warning: ‘%s’ directive output may be truncated writing 4 bytes into a region of size between 1 and 11 [-Wformat-truncation=]
drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1425:17: note: ‘snprintf’ output between 7 and 17 bytes into a destination of size 13
drivers/net/ethernet/intel/ice/ice_ptp.c:43:27: warning: ‘%s’ directive output may be truncated writing up to 479 bytes into a region of size 64 [-Wformat-truncation=]
drivers/net/ethernet/intel/ice/ice_ptp.c:42:17: note: ‘snprintf’ output between 1 and 480 bytes into a destination of size 64
drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning: ‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=]
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note: directive argument in the range [0, 65535]
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note: directive argument in the range [0, 65535]
drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note: ‘snprintf’ output between 23 and 43 bytes into a destination of size 32

Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-2-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp_bpf: properly release resources on error paths
Paolo Abeni [Tue, 17 Oct 2023 15:49:51 +0000 (17:49 +0200)]
tcp_bpf: properly release resources on error paths

In the blamed commit below, I completely forgot to release the acquired
resources before erroring out in the TCP BPF code, as reported by Dan.

Address the issues by replacing the bogus return with a jump to the
relevant cleanup code.

Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/8f99194c698bcef12666f0a9a999c58f8b1cb52c.1697557782.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
Pedro Tammela [Tue, 17 Oct 2023 14:36:02 +0000 (11:36 -0300)]
net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve

Christian Theune says:
   I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script,
   leaving me with a non-functional uplink on a remote router.

A 'rt' curve cannot be used as a inner curve (parent class), but we were
allowing such configurations since the qdisc was introduced. Such
configurations would trigger a UAF as Budimir explains:
   The parent will have vttree_insert() called on it in init_vf(),
   but will not have vttree_remove() called on it in update_vf()
   because it does not have the HFSC_FSC flag set.

The qdisc always assumes that inner classes have the HFSC_FSC flag set.
This is by design as it doesn't make sense 'qdisc wise' for an 'rt'
curve to be an inner curve.

Budimir's original patch disallows users to add classes with a 'rt'
parent, but this is too strict as it breaks users that have been using
'rt' as a inner class. Another approach, taken by this patch, is to
upgrade the inner 'rt' into a 'sc', warning the user in the process.
It avoids the UAF reported by Budimir while also being more permissive
to bad scripts/users/code using 'rt' as a inner class.

Users checking the `tc class ls [...]` or `tc class get [...]` dumps would
observe the curve change and are potentially breaking with this change.

v1->v2: https://lore.kernel.org/all/20231013151057.2611860-1-pctammela@mojatatu.com/
- Correct 'Fixes' tag and merge with revert (Jakub)

Cc: Christian Theune <ct@flyingcircus.io>
Cc: Budimir Markovic <markovicbudimir@gmail.com>
Fixes: b3d26c5702c7 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve")
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231017143602.3191556-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: mdio-mux: fix C45 access returning -EIO after API change
Vladimir Oltean [Tue, 17 Oct 2023 14:31:44 +0000 (17:31 +0300)]
net: mdio-mux: fix C45 access returning -EIO after API change

The mii_bus API conversion to read_c45() and write_c45() did not cover
the mdio-mux driver before read() and write() were made C22-only.

This broke arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso.
The -EOPNOTSUPP from mdiobus_c45_read() is transformed by
get_phy_c45_devs_in_pkg() into -EIO, is further propagated to
of_mdiobus_register() and this makes the mdio-mux driver fail to probe
the entire child buses, not just the PHYs that cause access errors.

Fix the regression by introducing special c45 read and write accessors
to mdio-mux which forward the operation to the parent MDIO bus.

Fixes: db1a63aed89c ("net: phy: Remove fallback to old C45 method")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20231017143144.3212657-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'selftests-tc-testing-fixes-for-kselftest'
Jakub Kicinski [Thu, 19 Oct 2023 01:07:53 +0000 (18:07 -0700)]
Merge branch 'selftests-tc-testing-fixes-for-kselftest'

Pedro Tammela says:

====================
selftests: tc-testing: fixes for kselftest

While playing around with TuxSuite, we noticed a couple of things were
broken for strict CI/automated builds. We had a script that didn't make into
the kselftest tarball and a couple of missing Kconfig knobs in our
minimal config.
====================

Link: https://lore.kernel.org/r/20231017152309.3196320-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: tc-testing: move auxiliary scripts to a dedicated folder
Pedro Tammela [Tue, 17 Oct 2023 15:23:09 +0000 (12:23 -0300)]
selftests: tc-testing: move auxiliary scripts to a dedicated folder

Some taprio tests need auxiliary scripts to wait for workqueue events to
process. Move them to a dedicated folder in order to package them for
the kselftests tarball.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231017152309.3196320-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoselftests: tc-testing: add missing Kconfig options to 'config'
Pedro Tammela [Tue, 17 Oct 2023 15:23:08 +0000 (12:23 -0300)]
selftests: tc-testing: add missing Kconfig options to 'config'

Make sure CI builds using just tc-testing/config can run all tdc tests.
Some tests were broken because of missing knobs.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231017152309.3196320-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
Eric Dumazet [Tue, 17 Oct 2023 12:45:26 +0000 (12:45 +0000)]
tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb

In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()")
we allowed to send an skb regardless of TSQ limits being hit if rtx queue
was empty or had a single skb, in order to better fill the pipe
when/if TX completions were slow.

Then later, commit 75c119afe14f ("tcp: implement rb-tree based
retransmit queue") accidentally removed the special case for
one skb in rtx queue.

Stefan Wahren reported a regression in single TCP flow throughput
using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust
TSO packet sizes based on min_rtt"). This last commit only made the
regression more visible, because it locked the TCP flow on a particular
behavior where TSQ prevented two skbs being pushed downstream,
adding silences on the wire between each TSO packet.

Many thanks to Stefan for his invaluable help !

Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Link: https://lore.kernel.org/netdev/7f31ddc8-9971-495e-a1f6-819df542e0af@gmx.net/
Reported-by: Stefan Wahren <wahrenst@gmx.net>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20231017124526.4060202-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoocteon_ep: update BQL sent bytes before ringing doorbell
Shinas Rasheed [Tue, 17 Oct 2023 10:50:30 +0000 (03:50 -0700)]
octeon_ep: update BQL sent bytes before ringing doorbell

Sometimes Tx is completed immediately after doorbell is updated, which
causes Tx completion routing to update completion bytes before the
same packet bytes are updated in sent bytes in transmit function, hence
hitting BUG_ON() in dql_completed(). To avoid this, update BQL
sent bytes before ringing doorbell.

Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231017105030.2310966-1-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: wangxun: remove redundant kernel log
Jiawen Wu [Tue, 17 Oct 2023 10:06:35 +0000 (18:06 +0800)]
net: wangxun: remove redundant kernel log

Since PBA info can be read from lspci, delete txgbe_read_pba_string()
and the prints. In addition, delete the redundant MAC address printing.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231017100635.154967-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoperf/benchmark: fix seccomp_unotify benchmark for 32-bit
Jiri Slaby (SUSE) [Tue, 17 Oct 2023 08:30:19 +0000 (10:30 +0200)]
perf/benchmark: fix seccomp_unotify benchmark for 32-bit

Commit 7d5cb68af638 (perf/benchmark: add a new benchmark for
seccom_unotify) added a reference to __NR_seccomp into perf. This is
fine as it added also a definition of __NR_seccomp for 64-bit. But it
failed to do so for 32-bit as instead of ifndef, ifdef was used.

Fix this typo (so fix the build of perf on 32-bit).

Fixes: 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify)
Cc: Andrei Vagin <avagin@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
18 months agoMerge branch 'net-fec-fix-device_get_match_data-usage'
Jakub Kicinski [Thu, 19 Oct 2023 00:47:12 +0000 (17:47 -0700)]
Merge branch 'net-fec-fix-device_get_match_data-usage'

Alexander Stein says:

====================
net: fec: Fix device_get_match_data usage

this is v2 adressing the regression introduced by commit b0377116decd
("net: ethernet: Use device_get_match_data()").

You could also remove the (!dev_info) case for Coldfire as this platform
has no quirks. But IMHO this should be kept as long as Coldfire platform
data is supported.
====================

Link: https://lore.kernel.org/r/20231017063419.925266-1-alexander.stein@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: fec: Remove non-Coldfire platform IDs
Alexander Stein [Tue, 17 Oct 2023 06:34:19 +0000 (08:34 +0200)]
net: fec: Remove non-Coldfire platform IDs

All i.MX platforms (non-Coldfire) use DT nowadays, so their platform ID
entries can be removed.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20231017063419.925266-3-alexander.stein@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: fec: Fix device_get_match_data usage
Alexander Stein [Tue, 17 Oct 2023 06:34:18 +0000 (08:34 +0200)]
net: fec: Fix device_get_match_data usage

device_get_match_data() expects that of_device_id->data points to actual
fec_devinfo data, not a platform_device_id entry.
Fix this by adjusting OF device data pointers to their corresponding
structs.
enum imx_fec_type is now unused and can be removed.

Fixes: b0377116decd ("net: ethernet: Use device_get_match_data()")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20231017063419.925266-2-alexander.stein@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodrivers: net: wwan: iosm: Fixed multiple typos in multiple files
Muhammad Muzammil [Sat, 14 Oct 2023 12:14:07 +0000 (17:14 +0500)]
drivers: net: wwan: iosm: Fixed multiple typos in multiple files

iosm_ipc_chnl_cfg.h: Fixed typo
iosm_ipc_imem_ops.h: Fixed typo
iosm_ipc_mux.h: Fixed typo
iosm_ipc_pm.h: Fixed typo
iosm_ipc_port.h: Fixed typo
iosm_ipc_trace.h: Fixed typo

Signed-off-by: Muhammad Muzammil <m.muzzammilashraf@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231014121407.10012-1-m.muzzammilashraf@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'spi-fix-v6-6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Wed, 18 Oct 2023 16:37:36 +0000 (09:37 -0700)]
Merge tag 'spi-fix-v6-6-rc4' of git://git./linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
 "A fix for the npcm-fiu driver in cases where there are no dummy bytes
  during reads"

* tag 'spi-fix-v6-6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: npcm-fiu: Fix UMA reads when dummy.nbytes == 0

18 months agoMerge tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 18 Oct 2023 16:30:03 +0000 (09:30 -0700)]
Merge tag 'regmap-fix-v6.6-rc6' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "A straightforward fix from Johan for a long standing bug in cases
  where we both have regmaps without devices and something is using
  dev_get_regmap()"

* tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix NULL deref on lookup

18 months agonetfilter: nf_tables: revert do not remove elements if set backend implements .abort
Pablo Neira Ayuso [Wed, 18 Oct 2023 11:18:39 +0000 (13:18 +0200)]
netfilter: nf_tables: revert do not remove elements if set backend implements .abort

nf_tables_abort_release() path calls nft_set_elem_destroy() for
NFT_MSG_NEWSETELEM which releases the element, however, a reference to
the element still remains in the working copy.

Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agonetfilter: nft_set_rbtree: .deactivate fails if element has expired
Pablo Neira Ayuso [Tue, 17 Oct 2023 10:28:27 +0000 (12:28 +0200)]
netfilter: nft_set_rbtree: .deactivate fails if element has expired

This allows to remove an expired element which is not possible in other
existing set backends, this is more noticeable if gc-interval is high so
expired elements remain in the tree. On-demand gc also does not help in
this case, because this is delete element path. Return NULL if element
has expired.

Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agoselftests: netfilter: Run nft_audit.sh in its own netns
Phil Sutter [Fri, 13 Oct 2023 20:02:24 +0000 (22:02 +0200)]
selftests: netfilter: Run nft_audit.sh in its own netns

Don't mess with the host's firewall ruleset. Since audit logging is not
per-netns, add an initial delay of a second so other selftests' netns
cleanups have a chance to finish.

Fixes: e8dbde59ca3f ("selftests: netfilter: Test nf_tables audit logging")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agonetfilter: nf_tables: audit log object reset once per table
Phil Sutter [Wed, 11 Oct 2023 15:06:59 +0000 (17:06 +0200)]
netfilter: nf_tables: audit log object reset once per table

When resetting multiple objects at once (via dump request), emit a log
message per table (or filled skb) and resurrect the 'entries' parameter
to contain the number of objects being logged for.

To test the skb exhaustion path, perform some bulk counter and quota
adds in the kselftest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agoneighbor: tracing: Move pin6 inside CONFIG_IPV6=y section
Geert Uytterhoeven [Mon, 16 Oct 2023 12:49:04 +0000 (14:49 +0200)]
neighbor: tracing: Move pin6 inside CONFIG_IPV6=y section

When CONFIG_IPV6=n, and building with W=1:

    In file included from include/trace/define_trace.h:102,
     from include/trace/events/neigh.h:255,
     from net/core/net-traces.c:51:
    include/trace/events/neigh.h: In function ‘trace_event_raw_event_neigh_create’:
    include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable]
       42 |                 struct in6_addr *pin6;
  |                                  ^~~~
    include/trace/trace_events.h:402:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’
      402 |         { assign; }                                                     \
  |           ^~~~~~
    include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
       44 |                              PARAMS(assign),                   \
  |                              ^~~~~~
    include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’
       23 | TRACE_EVENT(neigh_create,
  | ^~~~~~~~~~~
    include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’
       41 |         TP_fast_assign(
  |         ^~~~~~~~~~~~~~
    In file included from include/trace/define_trace.h:103,
     from include/trace/events/neigh.h:255,
     from net/core/net-traces.c:51:
    include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’:
    include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable]
       42 |                 struct in6_addr *pin6;
  |                                  ^~~~
    include/trace/perf.h:51:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’
       51 |         { assign; }                                                     \
  |           ^~~~~~
    include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
       44 |                              PARAMS(assign),                   \
  |                              ^~~~~~
    include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’
       23 | TRACE_EVENT(neigh_create,
  | ^~~~~~~~~~~
    include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’
       41 |         TP_fast_assign(
  |         ^~~~~~~~~~~~~~

Indeed, the variable pin6 is declared and initialized unconditionally,
while it is only used and needlessly re-initialized when support for
IPv6 is enabled.

Fix this by dropping the unused variable initialization, and moving the
variable declaration inside the existing section protected by a check
for CONFIG_IPV6.

Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>