linux.git
3 years agonet: lantiq_etop: Fix return type for implementation of ndo_start_xmit
GUO Zihua [Fri, 2 Sep 2022 08:15:21 +0000 (16:15 +0800)]
net: lantiq_etop: Fix return type for implementation of ndo_start_xmit

Since Linux now supports CFI, it will be a good idea to fix mismatched
return type for implementation of hooks. Otherwise this might get
cought out by CFI and cause a panic.

ltq_etop_tx() would return either NETDEV_TX_BUSY or NETDEV_TX_OK, so
change the return type to netdev_tx_t directly.

Signed-off-by: GUO Zihua <guozihua@huawei.com>
Link: https://lore.kernel.org/r/20220902081521.59867-1-guozihua@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: sunplus: Fix return type for implementation of ndo_start_xmit
GUO Zihua [Fri, 2 Sep 2022 08:15:50 +0000 (16:15 +0800)]
net: sunplus: Fix return type for implementation of ndo_start_xmit

Since Linux now supports CFI, it will be a good idea to fix mismatched
return type for implementation of hooks. Otherwise this might get
cought out by CFI and cause a panic.

spl2sw_ethernet_start_xmit() would return either NETDEV_TX_BUSY or
NETDEV_TX_OK, so change the return type to netdev_tx_t directly.

Signed-off-by: GUO Zihua <guozihua@huawei.com>
Link: https://lore.kernel.org/r/20220902081550.60095-1-guozihua@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: xscale: Fix return type for implementation of ndo_start_xmit
GUO Zihua [Fri, 2 Sep 2022 08:16:12 +0000 (16:16 +0800)]
net: xscale: Fix return type for implementation of ndo_start_xmit

Since Linux now supports CFI, it will be a good idea to fix mismatched
return type for implementation of hooks. Otherwise this might get
cought out by CFI and cause a panic.

eth_xmit() would return either NETDEV_TX_BUSY or NETDEV_TX_OK, so
change the return type to netdev_tx_t directly.

Signed-off-by: GUO Zihua <guozihua@huawei.com>
Link: https://lore.kernel.org/r/20220902081612.60405-1-guozihua@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: broadcom: Fix return type for implementation of
GUO Zihua [Fri, 2 Sep 2022 07:54:07 +0000 (15:54 +0800)]
net: broadcom: Fix return type for implementation of

Since Linux now supports CFI, it will be a good idea to fix mismatched
return type for implementation of hooks. Otherwise this might get
cought out by CFI and cause a panic.

bcm4908_enet_start_xmit() would return either NETDEV_TX_BUSY or
NETDEV_TX_OK, so change the return type to netdev_tx_t directly.

Signed-off-by: GUO Zihua <guozihua@huawei.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220902075407.52358-1-guozihua@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ieee802154: Fix compilation error when CONFIG_IEEE802154_NL802154_EXPERIMENTAL...
Gal Pressman [Fri, 2 Sep 2022 03:06:20 +0000 (20:06 -0700)]
net: ieee802154: Fix compilation error when CONFIG_IEEE802154_NL802154_EXPERIMENTAL is disabled

When CONFIG_IEEE802154_NL802154_EXPERIMENTAL is disabled,
NL802154_CMD_DEL_SEC_LEVEL is undefined and results in a compilation
error:
net/ieee802154/nl802154.c:2503:19: error: 'NL802154_CMD_DEL_SEC_LEVEL' undeclared here (not in a function); did you mean 'NL802154_CMD_SET_CCA_ED_LEVEL'?
 2503 |  .resv_start_op = NL802154_CMD_DEL_SEC_LEVEL + 1,
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NL802154_CMD_SET_CCA_ED_LEVEL

Unhide the experimental commands, having them defined in an enum
makes no difference.

Fixes: 9c5d03d36251 ("genetlink: start to validate reserved header bytes")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://lore.kernel.org/r/20220902030620.2737091-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: remove netif_tx_napi_add()
Jakub Kicinski [Thu, 1 Sep 2022 00:00:58 +0000 (17:00 -0700)]
net: remove netif_tx_napi_add()

All callers are now gone.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bql: add more documentation
Eric Dumazet [Wed, 31 Aug 2022 18:44:27 +0000 (18:44 +0000)]
net: bql: add more documentation

Add some documentation for netdev_tx_sent_queue() and
netdev_tx_completed_queue()

Stating that netdev_tx_completed_queue() must be called once
per TX completion round is apparently not obvious for everybody.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-ipa-transaction-state-IDs'
David S. Miller [Fri, 2 Sep 2022 11:08:44 +0000 (12:08 +0100)]
Merge branch 'net-ipa-transaction-state-IDs'

Alex Elder says:

====================
net: ipa: use IDs to track transaction state

This series is the first of three groups of changes that simplify
the way the IPA driver tracks the state of its transactions.

Each GSI channel has a fixed number of transactions allocated at
initialization time.  The number allocated matches the number of
TREs in the transfer ring associated with the channel.  This is
because the transfer ring limits the number of transfers that can
ever be underway, and in the worst case, each transaction represents
a single TRE.

Transactions go through various states during their lifetime.
Currently a set of lists keeps track of which transactions are in
each state.  Initially, all transactions are free.  An allocated
transaction is placed on the allocated list.  Once an allocated
transaction is committed, it is moved from the allocated to the
committed list.  When a committed transaction is sent to hardware
(via a doorbell) it is moved to the pending list.  When hardware
signals that some work has completed, transactions are moved to the
completed list.  Finally, when a completed transaction is polled
it's moved to the polled list before being removed when it becomes
free.

Changing a transaction's state thus normally involves manipulating
two lists, and to prevent corruption a spinlock is held while the
lists are updated.

Transactions move through their states in a well-defined sequence
though, and they do so strictly in order.  So transaction 0 is
always allocated before transaction 1; transaction 0 is always
committed before transaction 1; and so on, through completion,
polling, and becoming free.  Because of this, it's sufficient to
just keep track of which transaction is the first in each state.
The rest of the transactions in a given state can be derived from
the first transaction in an "adjacent" state.  As a result, we can
track the state of all transactions with a set of indexes, and can
update these without the need for a spinlock.

This first group of patches just defines the set of indexes that
will be used for this new way of tracking transaction state.  Two
more groups of patches will follow.  I've broken the 17 patches into
these three groups to facilitate review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: track polled transactions with an ID
Alex Elder [Wed, 31 Aug 2022 22:40:17 +0000 (17:40 -0500)]
net: ipa: track polled transactions with an ID

Add a transaction ID to track the first element in the transaction
array that has been polled.  Advance the ID when we are releasing a
transaction.

Temporarily add warnings that verify that the first polled
transaction tracked by the ID matches the first element on the
polled list, both when polling and freeing.

Remove the temporary warnings added by the previous commit.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: track completed transactions with an ID
Alex Elder [Wed, 31 Aug 2022 22:40:16 +0000 (17:40 -0500)]
net: ipa: track completed transactions with an ID

Add a transaction ID field to track the first element in the
transaction array that has completed but has not yet been polled.

Advance the ID when we are processing a transaction in the NAPI
polling loop (where completed transactions become polled).

Temporarily add warnings that verify that the first completed
transaction tracked by the ID matches the first element on the
completed list, both when pending and completing.

Remove the temporary warnings added by the previous commit.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: track pending transactions with an ID
Alex Elder [Wed, 31 Aug 2022 22:40:15 +0000 (17:40 -0500)]
net: ipa: track pending transactions with an ID

Add a transaction ID field to track the first element in the
transaction array that is pending (sent to hardware) but not yet
complete.  Advance the ID when a completion event for a channel
indicates that transactions have completed.

Temporarily add warnings that verify that the first pending
transaction tracked by the ID matches the first element on the
pending list, both when pending and completing, as well as when
resetting the channel.

Remove the temporary warnings added by the previous commit.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: track committed transactions with an ID
Alex Elder [Wed, 31 Aug 2022 22:40:14 +0000 (17:40 -0500)]
net: ipa: track committed transactions with an ID

Add a transaction ID field to track the first element in a channel's
transaction array that has been committed, but not yet passed to the
hardware.  Advance the ID when the hardware is notified via doorbell
that TREs from a transaction are ready for consumption.

Temporarily add warnings that verify that the first committed
transaction tracked by the ID matches the first element on the
committed list, both when committing and pending (at doorbell).

Remove the temporary warnings added by the previous commit.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: track allocated transactions with an ID
Alex Elder [Wed, 31 Aug 2022 22:40:13 +0000 (17:40 -0500)]
net: ipa: track allocated transactions with an ID

Transactions for a channel are now managed in an array, with a free
transaction ID indicating which is the next one free.

Add another transaction ID field to track the first element in the
array that has been allocated.  Advance it when a transaction is
committed (because that is when that transaction leaves allocated
state).

Temporarily add warnings that verify that the first allocated
transaction tracked by the ID matches the first element on the
allocated list, both when allocating and committing a transaction.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: use an array for transactions
Alex Elder [Wed, 31 Aug 2022 22:40:12 +0000 (17:40 -0500)]
net: ipa: use an array for transactions

Transactions are always allocated one at a time.  The maximum number
of them we could ever need occurs if each TRE is assigned to a
transaction.  So a channel requires no more transactions than the
number of TREs in its transfer ring.  That number is known to be a
power-of-2 less than 65536.

The transaction pool abstraction is used for other things, but for
transactions we can use a simple array of transaction structures,
and use a free index to indicate which entry in the array is the
next one free for allocation.

By having the number of elements in the array be a power-of-2, we
can use an ever-incrementing 16-bit free index, and use it modulo
the array size.  Distinguish a "trans_id" (whose value can exceed
the number of entries in the transaction array) from a "trans_index"
(which is less than the number of entries).

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'lan966x-make-reset-optional'
David S. Miller [Fri, 2 Sep 2022 10:37:27 +0000 (11:37 +0100)]
Merge branch 'lan966x-make-reset-optional'

Michael Walle says:

====================
net: lan966x: make reset optional

This is the remaining part of the reset rework on the LAN966x targetting
the netdev tree.

The former series can be found at:
https://lore.kernel.org/lkml/20220826115607.1148489-1-michael@walle.cc/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: lan966x: make reset optional
Michael Walle [Wed, 31 Aug 2022 11:18:55 +0000 (13:18 +0200)]
net: lan966x: make reset optional

There is no dedicated reset for just the switch core. The reset which
is used up until now, is more of a global reset, resetting almost the
whole SoC and cause spurious errors by doing so. Make it possible to
handle the reset elsewhere and make the reset optional.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: sparx5: don't require a reset line
Michael Walle [Wed, 31 Aug 2022 11:18:54 +0000 (13:18 +0200)]
dt-bindings: net: sparx5: don't require a reset line

Make the reset line optional. It turns out, there is no dedicated reset
for the switch. Instead, the reset which was used up until now, was kind
of a global reset. This is now handled elsewhere, thus don't require a
reset.

Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: tcp: send consistent autoflowlabel in SYN_RECV state
Eric Dumazet [Wed, 31 Aug 2022 20:37:29 +0000 (13:37 -0700)]
ipv6: tcp: send consistent autoflowlabel in SYN_RECV state

This is a followup of commit c67b85558ff2 ("ipv6: tcp: send consistent
autoflowlabel in TIME_WAIT state"), but for SYN_RECV state.

In some cases, TCP sends a challenge ACK on behalf of a SYN_RECV request.
WHen this happens, we want to use the flow label that was used when
the prior SYNACK packet was sent, instead of another one.

After his patch, following packetdrill passes:

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

  +.2 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
   +0 > (flowlabel 0x11) S. 0:0(0) ack 1 <...>
// Test if a challenge ack is properly sent (same flowlabel than prior SYNACK)
   +.01 < . 4000000000:4000000000(0) ack 1 win 320
   +0  > (flowlabel 0x11) . 1:1(0) ack 1

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220831203729.458000-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: net: dsa: symlink the tc_actions.sh test
Vladimir Oltean [Wed, 31 Aug 2022 17:08:39 +0000 (20:08 +0300)]
selftests: net: dsa: symlink the tc_actions.sh test

This has been validated on the Ocelot/Felix switch family (NXP LS1028A)
and should be relevant to any switch driver that offloads the tc-flower
and/or tc-matchall actions trap, drop, accept, mirred, for which DSA has
operations.

TEST: gact drop and ok (skip_hw)                                    [ OK ]
TEST: mirred egress flower redirect (skip_hw)                       [ OK ]
TEST: mirred egress flower mirror (skip_hw)                         [ OK ]
TEST: mirred egress matchall mirror (skip_hw)                       [ OK ]
TEST: mirred_egress_to_ingress (skip_hw)                            [ OK ]
TEST: gact drop and ok (skip_sw)                                    [ OK ]
TEST: mirred egress flower redirect (skip_sw)                       [ OK ]
TEST: mirred egress flower mirror (skip_sw)                         [ OK ]
TEST: mirred egress matchall mirror (skip_sw)                       [ OK ]
TEST: trap (skip_sw)                                                [ OK ]
TEST: mirred_egress_to_ingress (skip_sw)                            [ OK ]

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220831170839.931184-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonetdevsim: remove redundant variable ret
Jinpeng Cui [Wed, 31 Aug 2022 15:43:29 +0000 (15:43 +0000)]
netdevsim: remove redundant variable ret

Return value directly from nsim_dev_reload_create()
instead of getting value from redundant variable ret.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn>
Link: https://lore.kernel.org/r/20220831154329.305372-1-cui.jinpeng2@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: rtnetlink: use netif_oper_up instead of open code
Juhee Kang [Wed, 31 Aug 2022 12:58:45 +0000 (21:58 +0900)]
net: rtnetlink: use netif_oper_up instead of open code

The open code is defined as a new helper function(netif_oper_up) on netdev.h,
the code is dev->operstate == IF_OPER_UP || dev->operstate == IF_OPER_UNKNOWN.
Thus, replace the open code to netif_oper_up. This patch doesn't change logic.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Link: https://lore.kernel.org/r/20220831125845.1333-1-claudiajkang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: sched: etf: remove true check in etf_enable_offload()
Zhengchao Shao [Wed, 31 Aug 2022 09:29:19 +0000 (17:29 +0800)]
net: sched: etf: remove true check in etf_enable_offload()

etf_enable_offload() is only called when q->offload is false in
etf_init(). So remove true check in etf_enable_offload().

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20220831092919.146149-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 1 Sep 2022 19:58:02 +0000 (12:58 -0700)]
Merge git://git./linux/kernel/git/netdev/net

tools/testing/selftests/net/.gitignore
  sort the net-next version and use it

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 1 Sep 2022 16:20:42 +0000 (09:20 -0700)]
Merge tag 'net-6.0-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth, bpf and wireless.

  Current release - regressions:

   - bpf:
      - fix wrong last sg check in sk_msg_recvmsg()
      - fix kernel BUG in purge_effective_progs()

   - mac80211:
      - fix possible leak in ieee80211_tx_control_port()
      - potential NULL dereference in ieee80211_tx_control_port()

  Current release - new code bugs:

   - nfp: fix the access to management firmware hanging

  Previous releases - regressions:

   - ip: fix triggering of 'icmp redirect'

   - sched: tbf: don't call qdisc_put() while holding tree lock

   - bpf: fix corrupted packets for XDP_SHARED_UMEM

   - bluetooth: hci_sync: fix suspend performance regression

   - micrel: fix probe failure

  Previous releases - always broken:

   - tcp: make global challenge ack rate limitation per net-ns and
     default disabled

   - tg3: fix potential hang-up on system reboot

   - mac802154: fix reception for no-daddr packets

  Misc:

   - r8152: add PID for the lenovo onelink+ dock"

* tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
  net/smc: Remove redundant refcount increase
  Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"
  tcp: make global challenge ack rate limitation per net-ns and default disabled
  tcp: annotate data-race around challenge_timestamp
  net: dsa: hellcreek: Print warning only once
  ip: fix triggering of 'icmp redirect'
  sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
  selftests: net: sort .gitignore file
  Documentation: networking: correct possessive "its"
  kcm: fix strp_init() order and cleanup
  mlxbf_gige: compute MDIO period based on i1clk
  ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
  net: lan966x: improve error handle in lan966x_fdma_rx_get_frame()
  nfp: fix the access to management firmware hanging
  net: phy: micrel: Make the GPIO to be non-exclusive
  net: virtio_net: fix notification coalescing comments
  net/sched: fix netdevice reference leaks in attach_default_qdiscs()
  net: sched: tbf: don't call qdisc_put() while holding tree lock
  net: Use u64_stats_fetch_begin_irq() for stats fetch.
  net: dsa: xrs700x: Use irqsave variant for u64 stats update
  ...

3 years agoMerge tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka...
Linus Torvalds [Thu, 1 Sep 2022 16:14:56 +0000 (09:14 -0700)]
Merge tag 'slab-for-6.0-rc4' of git://git./linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

 - A fix from Waiman Long to avoid a theoretical deadlock reported by
   lockdep.

* tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

3 years agoMerge tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 1 Sep 2022 16:05:25 +0000 (09:05 -0700)]
Merge tag 'sound-6.0-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Just handful changes at this time. The only major change is the
  regression fix about the x86 WC-page buffer allocation.

  The rest are trivial data-race fixes for ALSA sequencer core, the
  possible out-of-bounds access fixes in the new ALSA control hash code,
  and a few device-specific workarounds and fixes"

* tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Add quirk for LH Labs Geek Out HD Audio 1V5
  ALSA: hda/realtek: Add speaker AMP init for Samsung laptops with ALC298
  ALSA: control: Re-order bounds checking in get_ctl_id_hash()
  ALSA: control: Fix an out-of-bounds bug in get_ctl_id_hash()
  ALSA: hda: intel-nhlt: Correct the handling of fmt_config flexible array
  ALSA: seq: Fix data-race at module auto-loading
  ALSA: seq: oss: Fix data-race for max_midi_devs access
  ALSA: memalloc: Revive x86-specific WC page allocations again

3 years agonet: sched: gred: remove NULL check before free table->tab in gred_destroy()
Zhengchao Shao [Wed, 31 Aug 2022 04:14:52 +0000 (12:14 +0800)]
net: sched: gred: remove NULL check before free table->tab in gred_destroy()

The kfree invoked by gred_destroy_vq checks whether the input parameter
is empty. Therefore, gred_destroy() doesn't need to check table->tab.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220831041452.33026-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agomm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex...
Waiman Long [Fri, 12 Aug 2022 18:30:33 +0000 (14:30 -0400)]
mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

A circular locking problem is reported by lockdep due to the following
circular locking dependency.

  +--> cpu_hotplug_lock --> slab_mutex --> kn->active --+
  |                                                     |
  +-----------------------------------------------------+

The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency
happens in

  kmem_cache_destroy(): cpus_read_lock(); mutex_lock(&slab_mutex);
  ==> sysfs_slab_unlink()
      ==> kobject_del()
          ==> kernfs_remove()
      ==> __kernfs_remove()
          ==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...);

The backward kn->active ==> cpu_hotplug_lock dependency happens in

  kernfs_fop_write_iter(): kernfs_get_active();
  ==> slab_attr_store()
      ==> cpu_partial_store()
          ==> flush_all(): cpus_read_lock()

One way to break this circular locking chain is to avoid holding
cpu_hotplug_lock and slab_mutex while deleting the kobject in
sysfs_slab_unlink() which should be equivalent to doing a write_lock
and write_unlock pair of the kn->active virtual lock.

Since the kobject structures are not protected by slab_mutex or the
cpu_hotplug_lock, we can certainly release those locks before doing
the delete operation.

Move sysfs_slab_unlink() and sysfs_slab_release() to the newly
created kmem_cache_release() and call it outside the slab_mutex &
cpu_hotplug_lock critical sections. There will be a slight delay
in the deletion of sysfs files if kmem_cache_release() is called
indirectly from a work function.

Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
3 years agoMerge branch 'rk3588-ethernet-support'
Paolo Abeni [Thu, 1 Sep 2022 09:53:17 +0000 (11:53 +0200)]
Merge branch 'rk3588-ethernet-support'

Sebastian Reichel says:

====================
RK3588 Ethernet Support

This adds ethernet support for the RK3588(s) SoCs.

Changes since PATCHv2:
 * Rebased to v6.0-rc1
 * Wrap DT bindings additions at 80 characters
 * Add Acked-by from Krzysztof

Changes since PATCHv1:
 * Drop first patch (not required) and rebase second patch accordingly
 * Rename DT property rockchip,php_grf -> rockchip,php-grf
====================

Link: https://lore.kernel.org/r/20220830154559.61506-1-sebastian.reichel@collabora.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agodt-bindings: net: rockchip-dwmac: add rk3588 gmac compatible
Sebastian Reichel [Tue, 30 Aug 2022 15:45:59 +0000 (17:45 +0200)]
dt-bindings: net: rockchip-dwmac: add rk3588 gmac compatible

Add compatible string for RK3588 gmac, which is similar to the RK3568
one, but needs another syscon device for clock selection.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: ethernet: stmmac: dwmac-rk: Add gmac support for rk3588
David Wu [Tue, 30 Aug 2022 15:45:58 +0000 (17:45 +0200)]
net: ethernet: stmmac: dwmac-rk: Add gmac support for rk3588

Add constants and callback functions for the dwmac on RK3588 soc.
As can be seen, the base structure is the same, only registers
and the bits in them moved slightly.

Signed-off-by: David Wu <david.wu@rock-chips.com>
[rebase, squash fixes]
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet/smc: Remove redundant refcount increase
Yacan Liu [Tue, 30 Aug 2022 15:23:14 +0000 (23:23 +0800)]
net/smc: Remove redundant refcount increase

For passive connections, the refcount increment has been done in
smc_clcsock_accept()-->smc_sock_alloc().

Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agoocteontx2-pf: Add egress PFC support
Suman Ghosh [Tue, 30 Aug 2022 12:03:04 +0000 (17:33 +0530)]
octeontx2-pf: Add egress PFC support

As of now all transmit queues transmit packets out of same scheduler
queue hierarchy. Due to this PFC frames sent by peer are not handled
properly, either all transmit queues are backpressured or none.
To fix this when user enables PFC for a given priority map relavant
transmit queue to a different scheduler queue hierarcy, so that
backpressure is applied only to the traffic egressing out of that TXQ.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20220830120304.158060-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agonet: sched: remove redundant NULL check in change hook function
Zhengchao Shao [Mon, 29 Aug 2022 07:12:19 +0000 (15:12 +0800)]
net: sched: remove redundant NULL check in change hook function

Currently, the change function can be called by two ways. The one way is
that qdisc_change() will call it. Before calling change function,
qdisc_change() ensures tca[TCA_OPTIONS] is not empty. The other way is
that .init() will call it. The opt parameter is also checked before
calling change function in .init(). Therefore, it's no need to check the
input parameter opt in change function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20220829071219.208646-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agoRevert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"
Jakub Kicinski [Thu, 1 Sep 2022 03:01:32 +0000 (20:01 -0700)]
Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"

This reverts commit 90fabae8a2c225c4e4936723c38857887edde5cc.

Patch was applied hastily, revert and let the v2 be reviewed.

Fixes: 90fabae8a2c2 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb")
Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'tcp-tcp-challenge-ack-fixes'
Jakub Kicinski [Thu, 1 Sep 2022 02:56:53 +0000 (19:56 -0700)]
Merge branch 'tcp-tcp-challenge-ack-fixes'

Eric Dumazet says:

====================
tcp: tcp challenge ack fixes

syzbot found a typical data-race addressed in the first patch.

While we are at it, second patch makes the global rate limit
per net-ns and disabled by default.
====================

Link: https://lore.kernel.org/r/20220830185656.268523-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotcp: make global challenge ack rate limitation per net-ns and default disabled
Eric Dumazet [Tue, 30 Aug 2022 18:56:56 +0000 (11:56 -0700)]
tcp: make global challenge ack rate limitation per net-ns and default disabled

Because per host rate limiting has been proven problematic (side channel
attacks can be based on it), per host rate limiting of challenge acks ideally
should be per netns and turned off by default.

This is a long due followup of following commits:

083ae308280d ("tcp: enable per-socket rate limiting of all 'challenge acks'")
f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock")
75ff39ccc1bd ("tcp: make challenge acks less predictable")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotcp: annotate data-race around challenge_timestamp
Eric Dumazet [Tue, 30 Aug 2022 18:56:55 +0000 (11:56 -0700)]
tcp: annotate data-race around challenge_timestamp

challenge_timestamp can be read an written by concurrent threads.

This was expected, but we need to annotate the race to avoid potential issues.

Following patch moves challenge_timestamp and challenge_count
to per-netns storage to provide better isolation.

Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: hellcreek: Print warning only once
Kurt Kanzenbach [Tue, 30 Aug 2022 16:34:48 +0000 (18:34 +0200)]
net: dsa: hellcreek: Print warning only once

In case the source port cannot be decoded, print the warning only once. This
still brings attention to the user and does not spam the logs at the same time.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet-next: Fix IP_UNICAST_IF option behavior for connected sockets
Richard Gobert [Mon, 29 Aug 2022 11:18:51 +0000 (13:18 +0200)]
net-next: Fix IP_UNICAST_IF option behavior for connected sockets

The IP_UNICAST_IF socket option is used to set the outgoing interface
for outbound packets.

The IP_UNICAST_IF socket option was added as it was needed by the
Wine project, since no other existing option (SO_BINDTODEVICE socket
option, IP_PKTINFO socket option or the bind function) provided the
needed characteristics needed by the IP_UNICAST_IF socket option. [1]
The IP_UNICAST_IF socket option works well for unconnected sockets,
that is, the interface specified by the IP_UNICAST_IF socket option
is taken into consideration in the route lookup process when a packet
is being sent. However, for connected sockets, the outbound interface
is chosen when connecting the socket, and in the route lookup process
which is done when a packet is being sent, the interface specified by
the IP_UNICAST_IF socket option is being ignored.

This inconsistent behavior was reported and discussed in an issue
opened on systemd's GitHub project [2]. Also, a bug report was
submitted in the kernel's bugzilla [3].

To understand the problem in more detail, we can look at what happens
for UDP packets over IPv4 (The same analysis was done separately in
the referenced systemd issue).
When a UDP packet is sent the udp_sendmsg function gets called and
the following happens:

1. The oif member of the struct ipcm_cookie ipc (which stores the
output interface of the packet) is initialized by the ipcm_init_sk
function to inet->sk.sk_bound_dev_if (the device set by the
SO_BINDTODEVICE socket option).

2. If the IP_PKTINFO socket option was set, the oif member gets
overridden by the call to the ip_cmsg_send function.

3. If no output interface was selected yet, the interface specified
by the IP_UNICAST_IF socket option is used.

4. If the socket is connected and no destination address is
specified in the send function, the struct ipcm_cookie ipc is not
taken into consideration and the cached route, that was calculated in
the connect function is being used.

Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken
into consideration.

This patch corrects the behavior of the IP_UNICAST_IF socket option
for connect()ed sockets by taking into consideration the
IP_UNICAST_IF sockopt when connecting the socket.

In order to avoid reconnecting the socket, this option is still
ignored when applied on an already connected socket until connect()
is called again by the Richard Gobert.

Change the __ip4_datagram_connect function, which is called during
socket connection, to take into consideration the interface set by
the IP_UNICAST_IF socket option, in a similar way to what is done in
the udp_sendmsg function.

[1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/
[2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018
[3] https://bugzilla.kernel.org/show_bug.cgi?id=210255

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220829111554.GA1771@debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoip: fix triggering of 'icmp redirect'
Nicolas Dichtel [Mon, 29 Aug 2022 10:01:21 +0000 (12:01 +0200)]
ip: fix triggering of 'icmp redirect'

__mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
My understanding is that fib_validate_source() is used to know if the src
address and the gateway address are on the same link. For that,
fib_validate_source() returns 1 (same link) or 0 (not the same network).
__mkroute_input() is the only user of these positive values, all other
callers only look if the returned value is negative.

Since the below patch, fib_validate_source() didn't return anymore 1 when
both addresses are on the same network, because the route lookup returns
RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
Let's adapat the test to return 1 again when both addresses are on the same
link.

CC: stable@vger.kernel.org
Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Heng Qi <hengqi@linux.alibaba.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'net-sched-remove-unused-variables'
Jakub Kicinski [Thu, 1 Sep 2022 02:39:56 +0000 (19:39 -0700)]
Merge branch 'net-sched-remove-unused-variables'

Zhengchao Shao says:

====================
net: sched: remove unused variables

The variable "other" is unused, remove it.
====================

Link: https://lore.kernel.org/r/20220830092255.281330-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: sched: gred/red: remove unused variables in struct red_stats
Zhengchao Shao [Tue, 30 Aug 2022 09:22:55 +0000 (17:22 +0800)]
net: sched: gred/red: remove unused variables in struct red_stats

The variable "other" in the struct red_stats is not used. Remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: sched: choke: remove unused variables in struct choke_sched_data
Zhengchao Shao [Tue, 30 Aug 2022 09:22:54 +0000 (17:22 +0800)]
net: sched: choke: remove unused variables in struct choke_sched_data

The variable "other" in the struct choke_sched_data is not used. Remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: axienet: Switch to 64-bit RX/TX statistics
Robert Hancock [Mon, 29 Aug 2022 23:39:01 +0000 (17:39 -0600)]
net: axienet: Switch to 64-bit RX/TX statistics

The RX and TX byte/packet statistics in this driver could be overflowed
relatively quickly on a 32-bit platform. Switch these stats to use the
u64_stats infrastructure to avoid this.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20220829233901.3429419-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet/rds: Pass a pointer to virt_to_page()
Linus Walleij [Mon, 29 Aug 2022 13:20:01 +0000 (15:20 +0200)]
net/rds: Pass a pointer to virt_to_page()

Functions that work on a pointer to virtual memory such as
virt_to_pfn() and users of that function such as
virt_to_page() are supposed to pass a pointer to virtual
memory, ideally a (void *) or other pointer. However since
many architectures implement virt_to_pfn() as a macro,
this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).

If we instead implement a proper virt_to_pfn(void *addr)
function the following happens (occurred on arch/arm):

net/rds/message.c:357:56: warning: passing argument 1
  of 'virt_to_pfn' makes pointer from integer without a
  cast [-Wint-conversion]

Fix this with an explicit cast.

Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220829132001.114858-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse
Jann Horn [Wed, 31 Aug 2022 17:06:00 +0000 (19:06 +0200)]
mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse

anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.

anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.

This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created.  So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.

This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same.  When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.

Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.

Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
Cc: stable@kernel.org
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agor8152: allow userland to disable multicast
Sven van Ashbrook [Tue, 30 Aug 2022 04:59:39 +0000 (04:59 +0000)]
r8152: allow userland to disable multicast

The rtl8152 driver does not disable multicasting when userspace asks
it to. For example:
 $ ifconfig eth0 -multicast -allmulti
 $ tcpdump -p -i eth0  # will still capture multicast frames

Fix by clearing the device multicast filter table when multicast and
allmulti are both unset.

Tested as follows:
- Set multicast on eth0 network interface
- verify that multicast packets are coming in:
  $ tcpdump -p -i eth0
- Clear multicast and allmulti on eth0 network interface
- verify that no more multicast packets are coming in:
  $ tcpdump -p -i eth0

Signed-off-by: Sven van Ashbrook <svenva@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20220830045923.net-next.v1.1.I4fee0ac057083d4f848caf0fa3a9fd466fc374a0@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agosch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
Toke Høiland-Jørgensen [Wed, 31 Aug 2022 09:21:03 +0000 (11:21 +0200)]
sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb

When the GSO splitting feature of sch_cake is enabled, GSO superpackets
will be broken up and the resulting segments enqueued in place of the
original skb. In this case, CAKE calls consume_skb() on the original skb,
but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
assuming the original skb still exists, when it really has been freed. Fix
this by adding the __NET_XMIT_STOLEN flag to the return value in this case.

Fixes: 0c850344d388 ("sch_cake: Conditionally split GSO segments")
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
Link: https://lore.kernel.org/r/20220831092103.442868-1-toke@toke.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: ethernet: move from strlcpy with unused retval to strscpy
Wolfram Sang [Tue, 30 Aug 2022 20:14:54 +0000 (22:14 +0200)]
net: ethernet: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool
Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet
Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: move from strlcpy with unused retval to strscpy
Wolfram Sang [Tue, 30 Aug 2022 20:14:52 +0000 (22:14 +0200)]
net: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: net: sort .gitignore file
Axel Rasmussen [Mon, 29 Aug 2022 18:47:48 +0000 (11:47 -0700)]
selftests: net: sort .gitignore file

This is the result of `sort tools/testing/selftests/net/.gitignore`, but
preserving the comment at the top.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Link: https://lore.kernel.org/r/20220829184748.1535580-1-axelrasmussen@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoDocumentation: networking: correct possessive "its"
Randy Dunlap [Mon, 29 Aug 2022 23:54:14 +0000 (16:54 -0700)]
Documentation: networking: correct possessive "its"

Change occurrences of "it's" that are possessive to "its"
so that they don't read as "it is".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220829235414.17110-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: phy: smsc: use device-managed clock API
Heiner Kallweit [Sun, 28 Aug 2022 17:26:55 +0000 (19:26 +0200)]
net: phy: smsc: use device-managed clock API

Simplify the code by using the device-managed clock API.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/b222be68-ba7e-999d-0a07-eca0ecedf74e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agokcm: fix strp_init() order and cleanup
Cong Wang [Sat, 27 Aug 2022 18:13:14 +0000 (11:13 -0700)]
kcm: fix strp_init() order and cleanup

strp_init() is called just a few lines above this csk->sk_user_data
check, it also initializes strp->work etc., therefore, it is
unnecessary to call strp_done() to cancel the freshly initialized
work.

And if sk_user_data is already used by KCM, psock->strp should not be
touched, particularly strp->work state, so we need to move strp_init()
after the csk->sk_user_data check.

This also makes a lockdep warning reported by syzbot go away.

Reported-and-tested-by: syzbot+9fc084a4348493ef65d2@syzkaller.appspotmail.com
Reported-by: syzbot+e696806ef96cdd2d87cd@syzkaller.appspotmail.com
Fixes: e5571240236c ("kcm: Check if sk_user_data already set in kcm_attach")
Fixes: dff8baa26117 ("kcm: Call strp_stop before strp_done in kcm_attach")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220827181314.193710-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxbf_gige: compute MDIO period based on i1clk
David Thompson [Fri, 26 Aug 2022 15:59:16 +0000 (11:59 -0400)]
mlxbf_gige: compute MDIO period based on i1clk

This patch adds logic to compute the MDIO period based on
the i1clk, and thereafter write the MDIO period into the YU
MDIO config register. The i1clk resource from the ACPI table
is used to provide addressing to YU bootrecord PLL registers.
The values in these registers are used to compute MDIO period.
If the i1clk resource is not present in the ACPI table, then
the current default hardcorded value of 430Mhz is used.
The i1clk clock value of 430MHz is only accurate for boards
with BF2 mid bin and main bin SoCs. The BF2 high bin SoCs
have i1clk = 500MHz, but can support a slower MDIO period.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20220826155916.12491-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 31 Aug 2022 17:13:34 +0000 (10:13 -0700)]
Merge tag 'fscache-fixes-20220831' of git://git./linux/kernel/git/dhowells/linux-fs

Pull fscache/cachefiles fixes from David Howells:

 - Fix kdoc on fscache_use/unuse_cookie().

 - Fix the error returned by cachefiles_ondemand_copen() from an upcall
   result.

 - Fix the distribution of requests in on-demand mode in cachefiles to
   be fairer by cycling through them rather than picking the one with
   the lowest ID each time (IDs being reused).

* tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  cachefiles: make on-demand request distribution fairer
  cachefiles: fix error return code in cachefiles_ondemand_copen()
  fscache: fix misdocumented parameter

3 years agoMerge tag 'for-linus-2022083101' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 31 Aug 2022 16:54:14 +0000 (09:54 -0700)]
Merge tag 'for-linus-2022083101' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - NULL pointer dereference fix for Steam driver (Lee Jones)

 - memory leak fix for hidraw (Karthik Alapati)

 - regression fix for functionality of some UCLogic tables (Benjamin
   Tissoires)

 - a few new device IDs and device-specific quirks

* tag 'for-linus-2022083101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: nintendo: fix rumble worker null pointer deref
  HID: intel-ish-hid: ipc: Add Meteor Lake PCI device ID
  HID: input: fix uclogic tablets
  HID: Add Apple Touchbar on T2 Macs in hid_have_special_driver list
  HID: add Lenovo Yoga C630 battery quirk
  HID: AMD_SFH: Add a DMI quirk entry for Chromebooks
  HID: thrustmaster: Add sparco wheel and fix array length
  hid: intel-ish-hid: ishtp: Fix ishtp client sending disordered message
  HID: ishtp-hid-clientHID: ishtp-hid-client: Fix comment typo
  HID: asus: ROG NKey: Ignore portion of 0x5a report
  HID: hidraw: fix memory leak in hidraw_release()
  HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report

3 years agoMerge tag 'v6.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 31 Aug 2022 16:47:06 +0000 (09:47 -0700)]
Merge tag 'v6.0-p2' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "Fix a boot performance regression due to an unnecessary dependency on
  XOR_BLOCKS"

* tag 'v6.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: lib - remove unneeded selection of XOR_BLOCKS

3 years agoMerge tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Linus Torvalds [Wed, 31 Aug 2022 16:23:16 +0000 (09:23 -0700)]
Merge tag 'lsm-pr-20220829' of git://git./linux/kernel/git/pcmoore/lsm

Pull LSM support for IORING_OP_URING_CMD from Paul Moore:
 "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD.

  These are necessary as without them the IORING_OP_URING_CMD remains
  outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch,
  and my SELinux patch). They have been discussed at length with the
  io_uring folks, and Jens has given his thumbs-up on the relevant
  patches (see the commit descriptions).

  There is one patch that is not strictly necessary, but it makes
  testing much easier and is very trivial: the /dev/null
  IORING_OP_URING_CMD patch."

* tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  Smack: Provide read control for io_uring_cmd
  /dev/null: add IORING_OP_URING_CMD support
  selinux: implement the security_uring_cmd() LSM hook
  lsm,io_uring: add LSM hooks for the new uring_cmd file op

3 years agocachefiles: make on-demand request distribution fairer
Xin Yin [Thu, 25 Aug 2022 02:09:45 +0000 (10:09 +0800)]
cachefiles: make on-demand request distribution fairer

For now, enqueuing and dequeuing on-demand requests all start from
idx 0, this makes request distribution unfair. In the weighty
concurrent I/O scenario, the request stored in higher idx will starve.

Searching requests cyclically in cachefiles_ondemand_daemon_read,
makes distribution fairer.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Reported-by: Yongqing Li <liyongqing@bytedance.com>
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220817065200.11543-1-yinxin.x@bytedance.com/
Link: https://lore.kernel.org/r/20220825020945.2293-1-yinxin.x@bytedance.com/
3 years agocachefiles: fix error return code in cachefiles_ondemand_copen()
Sun Ke [Fri, 26 Aug 2022 02:35:15 +0000 (10:35 +0800)]
cachefiles: fix error return code in cachefiles_ondemand_copen()

The cache_size field of copen is specified by the user daemon.
If cache_size < 0, then the OPEN request is expected to fail,
while copen itself shall succeed. However, returning 0 is indeed
unexpected when cache_size is an invalid error code.

Fix this by returning error when cache_size is an invalid error code.

Changes
=======
v4: update the code suggested by Dan
v3: update the commit log suggested by Jingbo.

Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Sun Ke <sunke32@huawei.com>
Suggested-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220818111935.1683062-1-sunke32@huawei.com/
Link: https://lore.kernel.org/r/20220818125038.2247720-1-sunke32@huawei.com/
Link: https://lore.kernel.org/r/20220826023515.3437469-1-sunke32@huawei.com/
3 years agofscache: fix misdocumented parameter
Khalid Masum [Thu, 18 Aug 2022 04:07:38 +0000 (10:07 +0600)]
fscache: fix misdocumented parameter

This patch fixes two warnings generated by make docs. The functions
fscache_use_cookie and fscache_unuse_cookie, both have a parameter
named cookie. But they are documented with the name "object" with
unclear description. Which generates the warning when creating docs.

This commit will replace the currently misdocumented parameter names
with the correct ones while adding proper descriptions.

CC: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220521142446.4746-1-khalid.masum.92@gmail.com/
Link: https://lore.kernel.org/r/20220818040738.12036-1-khalid.masum.92@gmail.com/
Link: https://lore.kernel.org/r/880d7d25753fb326ee17ac08005952112fcf9bdb.1657360984.git.mchehab@kernel.org/
3 years agoMerge branch 'hns3-next'
David S. Miller [Wed, 31 Aug 2022 13:10:50 +0000 (14:10 +0100)]
Merge branch 'hns3-next'

Guangbin Huang says:

====================
net: hns3: updates for -next

This series includes some updates for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: net: hns3: add querying and setting fec off mode from firmware
Guangbin Huang [Tue, 30 Aug 2022 11:11:17 +0000 (19:11 +0800)]
net: hns3: net: hns3: add querying and setting fec off mode from firmware

For some new devices, the FEC mode can not be set to OFF in speed 200G.
In order to flexibly adapt to all types of devices, driver queries
fec ability from firmware to decide whether OFF mode can be supported.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: add querying and setting fec llrs mode from firmware
Hao Lan [Tue, 30 Aug 2022 11:11:16 +0000 (19:11 +0800)]
net: hns3: add querying and setting fec llrs mode from firmware

This patch supports llrs fec mode in speed 200G for some new devices, and
suppoprts querying llrs fec ability from firmware.

Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: add querying fec ability from firmware
Guangbin Huang [Tue, 30 Aug 2022 11:11:15 +0000 (19:11 +0800)]
net: hns3: add querying fec ability from firmware

For some new devices, driver can queries fec ability from firmware to
decide which FEC mode can be supported.

If devices of old version which not support querying fec ability, driver
sets fixed ability according to current speed.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: add getting capabilities of gro offload and fd from firmware
Guangbin Huang [Tue, 30 Aug 2022 11:11:14 +0000 (19:11 +0800)]
net: hns3: add getting capabilities of gro offload and fd from firmware

As some new devices may not support GRO offload and flow table director,
to support these devices, driver needs to querying capabilities of GRO
offload and flow table director from firmware. Whether the driver
supports these two features depends on capabilities.

For old device of version HNAE3_DEVICE_VERSION_V2, driver sets their
capabilities of these two features to fixed value.

Setting default features of netdev and debugfs also need to identify
whether support these two features.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'thunderbolt-end-to-end-flow-control'
David S. Miller [Wed, 31 Aug 2022 13:05:12 +0000 (14:05 +0100)]
Merge branch 'thunderbolt-end-to-end-flow-control'

Mika Westerberg says:

====================
thunderbolt: net: Enable full end-to-end flow control

Thunderbolt/USB4 host controllers support full end-to-end flow control
that prevents dropping packets if there are not enough hardware receive
buffers. So far it has not been enabled for the networking driver yet
but this series changes that. There is one snag though: the second
generation (Intel Falcon Ridge) had a bug that needs special quirk to
get it working. We had that in the early stages of the Thunderbolt/USB4
driver but it got dropped because it was not needed at the time. Now we
add it back as a quirk for the host controller (NHI).

The first patch of this series is a bugfix that I'm planning to push for
v6.0-rc. Rest are v6.1 material. This also includes a patch that shows
the XDomain link type in sysfs the same way we do for USB4 routers and
updates the networking driver module description.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: thunderbolt: Update module description with mention of USB4
Mika Westerberg [Tue, 30 Aug 2022 15:32:50 +0000 (18:32 +0300)]
net: thunderbolt: Update module description with mention of USB4

It is Thunderbolt/USB4 now so reflect that in the module description too
to avoid any confusion. No functional changes.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: thunderbolt: Enable full end-to-end flow control
Mika Westerberg [Tue, 30 Aug 2022 15:32:49 +0000 (18:32 +0300)]
net: thunderbolt: Enable full end-to-end flow control

USB4NET protocol allows the networking drivers to take advantage of
end-to-end flow control supported by the USB4 host interface. This
should prevent the receiving side from dropping network packets.

In adddition add a module parameter that can be used to turn this off
just in case it causes problems.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agothunderbolt: Add back Intel Falcon Ridge end-to-end flow control workaround
Mika Westerberg [Tue, 30 Aug 2022 15:32:48 +0000 (18:32 +0300)]
thunderbolt: Add back Intel Falcon Ridge end-to-end flow control workaround

As we are now enabling full end-to-end flow control to the Thunderbolt
networking driver, in order for it to work properly on second generation
Thunderbolt hardware (Falcon Ridge), we need to add back the workaround
that was removed with commit 53f13319d131 ("thunderbolt: Get rid of E2E
workaround"). However, this time we only apply it for Falcon Ridge
controllers as a form of an additional quirk. For non-Falcon Ridge this
does nothing.

While there fix a typo 'reqister' -> 'register' in the comment.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agothunderbolt: Show link type for XDomain connections too
Mika Westerberg [Tue, 30 Aug 2022 15:32:47 +0000 (18:32 +0300)]
thunderbolt: Show link type for XDomain connections too

Following what we do for routers already, extend this to XDomain
connections as well. This will show in sysfs whether the link is in USB4
or Thunderbolt mode.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: thunderbolt: Enable DMA paths only after rings are enabled
Mika Westerberg [Tue, 30 Aug 2022 15:32:46 +0000 (18:32 +0300)]
net: thunderbolt: Enable DMA paths only after rings are enabled

If the other host starts sending packets early on it is possible that we
are still in the middle of populating the initial Rx ring packets to the
ring. This causes the tbnet_poll() to mess over the queue and causes
list corruption. This happens specifically when connected with macOS as
it seems start sending various IP discovery packets as soon as its side
of the paths are configured.

To prevent this we move the DMA path enabling to happen after we have
primed the Rx ring. This makes sure no incoming packets can arrive
before we are ready to handle them.

Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
Duoming Zhou [Sat, 27 Aug 2022 15:38:15 +0000 (23:38 +0800)]
ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler

The function neigh_timer_handler() is a timer handler that runs in an
atomic context. When used by rocker, neigh_timer_handler() calls
"kzalloc(.., GFP_KERNEL)" that may sleep. As a result, the sleep in
atomic context bug will happen. One of the processes is shown below:

ofdpa_fib4_add()
 ...
 neigh_add_timer()

(wait a timer)

neigh_timer_handler()
 neigh_release()
  neigh_destroy()
   rocker_port_neigh_destroy()
    rocker_world_port_neigh_destroy()
     ofdpa_port_neigh_destroy()
      ofdpa_port_ipv4_neigh()
       kzalloc(sizeof(.., GFP_KERNEL) //may sleep

This patch changes the gfp_t parameter of kzalloc() from GFP_KERNEL to
GFP_ATOMIC in order to mitigate the bug.

Fixes: 00fc0c51e35b ("rocker: Change world_ops API and implementation to be switchdev independant")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agophy: lan966x: add support for QUSGMII
Maxime Chevallier [Fri, 26 Aug 2022 14:17:22 +0000 (16:17 +0200)]
phy: lan966x: add support for QUSGMII

Makes so that the serdes driver also takes QUSGMII in consideration.
It's configured exactly as QSGMII as far as the serdes driver is
concerned.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-dsa-microchip-error-hndling-reg-access-validation'
David S. Miller [Wed, 31 Aug 2022 08:41:49 +0000 (09:41 +0100)]
Merge branch 'net-dsa-microchip-error-hndling-reg-access-validation'

Oleksij Rempel says:

====================
net: dsa: microchip: add error handling and register access validation

changes v4:
- add Reviewed-by: Vladimir Oltean <olteanv@gmail.com> to all patches
- fix checkpatch warnings.

changes v3:
- fix build error in the middle of the patch stack.

changes v2:
- add regmap_ranges for KSZ9477
- drop output clock devicetree in driver validation patches. DTs need
  some more refactoring and can be done in a separate patch set.
- remove some unused variables.

This patch series adds error handling for the PHY read/write path and optional
register access validation.
After adding regmap_ranges for KSZ8563 some bugs was detected, so
critical bug fixes are sorted before ragmap_range patch.

Potentially this bug fixes can be ported to stable kernels, but need to be
reworked.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: remove IS_9893 flag
Oleksij Rempel [Fri, 26 Aug 2022 10:56:34 +0000 (12:56 +0200)]
net: dsa: microchip: remove IS_9893 flag

Use chip_id as other places of this code do it

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: remove unused sgmii variable
Oleksij Rempel [Fri, 26 Aug 2022 10:56:33 +0000 (12:56 +0200)]
net: dsa: microchip: remove unused sgmii variable

This variable is not used. So, remove it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: ksz9477: remove unused "on" variable
Oleksij Rempel [Fri, 26 Aug 2022 10:56:32 +0000 (12:56 +0200)]
net: dsa: microchip: ksz9477: remove unused "on" variable

This variable is not used on ksz9477 side. Remove it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: remove unused port phy variable
Oleksij Rempel [Fri, 26 Aug 2022 10:56:31 +0000 (12:56 +0200)]
net: dsa: microchip: remove unused port phy variable

This variable is unused. So, drop it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: ksz9477: use internal_phy instead of phy_port_cnt
Oleksij Rempel [Fri, 26 Aug 2022 10:56:30 +0000 (12:56 +0200)]
net: dsa: microchip: ksz9477: use internal_phy instead of phy_port_cnt

With code refactoring was introduced new variable internal_phy. Let's
use it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: add regmap_range for KSZ9477 chip
Oleksij Rempel [Fri, 26 Aug 2022 10:56:29 +0000 (12:56 +0200)]
net: dsa: microchip: add regmap_range for KSZ9477 chip

Add register validation for KSZ9477

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: ksz9477: remove MII_CTRL1000 check from ksz9477_w_phy()
Oleksij Rempel [Fri, 26 Aug 2022 10:56:28 +0000 (12:56 +0200)]
net: dsa: microchip: ksz9477: remove MII_CTRL1000 check from ksz9477_w_phy()

The reason why PHYlib may access MII_CTRL1000 on the chip without GBit
support is only if chip provides wrong information about extended caps
register. This issue is now handled by ksz9477_r_phy_quirks()

With proper regmap_ranges provided for all chips we will be able to
catch this kind of bugs any way. So, remove this sanity check.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: add regmap_range for KSZ8563 chip
Oleksij Rempel [Fri, 26 Aug 2022 10:56:27 +0000 (12:56 +0200)]
net: dsa: microchip: add regmap_range for KSZ8563 chip

Add register validation for KSZ8563.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: add support for regmap_access_tables
Oleksij Rempel [Fri, 26 Aug 2022 10:56:26 +0000 (12:56 +0200)]
net: dsa: microchip: add support for regmap_access_tables

This is complex driver with support for different chips with different
layouts. To detect at least some bugs earlier, we should validate register
accesses by using regmap_access_table support.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: KSZ9893: do not write to not supported Output Clock Control...
Oleksij Rempel [Fri, 26 Aug 2022 10:56:25 +0000 (12:56 +0200)]
net: dsa: microchip: KSZ9893: do not write to not supported Output Clock Control Register

This issue was detected after adding regmap register access validation.
KSZ9893 compatible chips do not have "Output Clock Control Register
0x0103". So, avoid writing to it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: ksz8795: add error handling to ksz8_r/w_phy
Oleksij Rempel [Fri, 26 Aug 2022 10:56:24 +0000 (12:56 +0200)]
net: dsa: microchip: ksz8795: add error handling to ksz8_r/w_phy

Now ksz_pread/ksz_pwrite can return error value. So, make use of it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: ksz9477: add error handling to ksz9477_r/w_phy
Oleksij Rempel [Fri, 26 Aug 2022 10:56:23 +0000 (12:56 +0200)]
net: dsa: microchip: ksz9477: add error handling to ksz9477_r/w_phy

Now ksz_pread/ksz_pwrite can return error value. So, make use of it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: forward error value on all ksz_pread/ksz_pwrite functions
Oleksij Rempel [Fri, 26 Aug 2022 10:56:22 +0000 (12:56 +0200)]
net: dsa: microchip: forward error value on all ksz_pread/ksz_pwrite functions

ksz_read*/ksz_write* are able to return errors, so forward it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: allow to pass return values for PHY read/write accesses
Oleksij Rempel [Fri, 26 Aug 2022 10:56:21 +0000 (12:56 +0200)]
net: dsa: microchip: allow to pass return values for PHY read/write accesses

PHY access may end with errors on different levels. So, allow to forward
return values where possible.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: don't announce extended register support on non Gbit chips
Oleksij Rempel [Fri, 26 Aug 2022 10:56:20 +0000 (12:56 +0200)]
net: dsa: microchip: don't announce extended register support on non Gbit chips

This issue was detected after adding support of regmap_ranges for KSZ8563R
chip. This chip is reporting extended registers support without having
actual extended registers. This made PHYlib request not existing
registers.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: do per-port Gbit detection instead of per-chip
Oleksij Rempel [Fri, 26 Aug 2022 10:56:19 +0000 (12:56 +0200)]
net: dsa: microchip: do per-port Gbit detection instead of per-chip

KSZ8563 has two 100Mbit PHYs and CPU port with RGMII support. Since
1000Mbit configuration for the RGMII capable MAC is present, we should
use per port validation.

As main part of migration to per-port validation we need to rework
ksz9477_switch_init() function. Which is using undocumented
REG_GLOBAL_OPTIONS register to detect per-chip Gbit support. So, it is
related to some sort of risk for regressions.

To reduce this risk I compared the code with publicly available
documentations. This function will executed on following currently
supported chips:
struct ksz_chip_data            OF compatible
KSZ9477 KSZ9477
KSZ9897 KSZ9897
KSZ9893 KSZ9893, KSZ9563
KSZ8563 KSZ8563
KSZ9567 KSZ9567

Only KSZ9893, KSZ9563, KSZ8563 document existence of 0xf ==
REG_GLOBAL_OPTIONS register with bit field description "SKU ID":
KSZ9893 0x0C
KSZ9563 0x1C
KSZ8563 0x3C

The existence of hidden flags is not documented.

KSZ9477, KSZ9897, KSZ9567 do not document this register at all.

Only KSZ8563 is documented as non Gbit chip: 100Mbit PHYs and RGMII CPU
port. So, this change should not introduce a regression for
configurations with properly used OF compatibles.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: microchip: add separate struct ksz_chip_data for KSZ8563 chip
Oleksij Rempel [Fri, 26 Aug 2022 10:56:18 +0000 (12:56 +0200)]
net: dsa: microchip: add separate struct ksz_chip_data for KSZ8563 chip

Add separate entry for the KSZ8563 chip. According to the documentation
it can support Gbit only on RGMII port. So, we will need to be able to
describe in the followup patch.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocore: Variable type completion
Xin Gao [Wed, 17 Aug 2022 01:35:29 +0000 (09:35 +0800)]
core: Variable type completion

'unsigned int' is better than 'unsigned'.

Signed-off-by: Xin Gao <gaoxin@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoRevert "net: devlink: add RNLT lock assertion to devlink_compat_switch_id_get()"
Vlad Buslov [Mon, 29 Aug 2022 12:13:24 +0000 (14:13 +0200)]
Revert "net: devlink: add RNLT lock assertion to devlink_compat_switch_id_get()"

This reverts commit 6005a8aecee8afeba826295321a612ab485c230e.

The assertion was intentionally removed in commit 043b8413e8c0 ("net:
devlink: remove redundant rtnl lock assert") and, contrary what is
described in the commit message, the comment reflects that: "Caller must
hold RTNL mutex or reference to dev...".

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20220829121324.3980376-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'mlxsw-configure-max-lag-id-for-spectrum-4'
Jakub Kicinski [Wed, 31 Aug 2022 06:20:55 +0000 (23:20 -0700)]
Merge branch 'mlxsw-configure-max-lag-id-for-spectrum-4'

Petr Machata says:

====================
mlxsw: Configure max LAG ID for Spectrum-4

Amit Cohen writes:

In the device, LAG identifiers are stored in the port group table (PGT).
During initialization, firmware reserves a certain amount of entries at
the beginning of this table for LAG identifiers.

In Spectrum-4, the size of the PGT table did not increase, but the
maximum number of LAG identifiers was doubled, leaving less room for
others entries (e.g., flood entries) that also reside in the PGT.

Therefore, in order to avoid a regression and as long as there is no
explicit requirement to support 256 LAGs, configure the firmware to
allocate the same amount of LAG entries (128) as in Spectrum-{2,3}.

This can be done via the 'max_lag' field in CONFIG_PROFILE command.

Patch set overview:
Patch #1 edits the comment of the existing 'max_lag' field.
Patch #2 adds support for configuring 'max_lag' field via CONFIG_PROFILE
command.
Patch #3 adds an helper function to get the actual 'max_lag' in the
device.
Patch #4 adjusts Spectrum-4 to configure 'max_lag' field.
====================

Link: https://lore.kernel.org/r/cover.1661527928.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: spectrum: Add a copy of 'struct mlxsw_config_profile' for Spectrum-4
Amit Cohen [Fri, 26 Aug 2022 16:06:52 +0000 (18:06 +0200)]
mlxsw: spectrum: Add a copy of 'struct mlxsw_config_profile' for Spectrum-4

Starting from Spectrum-4, the maximum number of LAG IDs can be configured
by software via CONFIG_PROFILE command during driver initialization.

Add a dedicated instance of 'struct mlxsw_config_profile' for Spectrum-4
and set the 'max_lag' field to 128, which is the same amount of LAG entries
as in Spectrum-{2,3}. Without this configuration, firmware reserves 256
(the value of 'cap_max_lag' resource) entries at beginning of PGT table for
LAG identifiers, which means that less entries in PGT will be available.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: Add a helper function for getting maximum LAG ID
Amit Cohen [Fri, 26 Aug 2022 16:06:51 +0000 (18:06 +0200)]
mlxsw: Add a helper function for getting maximum LAG ID

Currently the driver queries the maximum supported LAG ID from firmware.
This will not be accurate anymore once the driver will configure 'max_lag'
via CONFIG_PROFILE command.

For resource query, firmware returns the maximum LAG ID which is supported
by hardware. Software can configure firmware to do not allocate entries for
all the supported LAGs, and to limit LAG IDs. In this case, the resource
query will not return the actual maximum LAG ID.

Add a helper function for getting this value. In case that 'max_lag' field
was set during initialization, return the value which was used, otherwise,
query firmware for the maximum supported ID.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomlxsw: Support configuring 'max_lag' via CONFIG_PROFILE
Amit Cohen [Fri, 26 Aug 2022 16:06:50 +0000 (18:06 +0200)]
mlxsw: Support configuring 'max_lag' via CONFIG_PROFILE

In the device, LAG identifiers are stored in the port group table (PGT).
During initialization, firmware reserves a certain amount of entries at
the beginning of this table for LAG identifiers.

In Spectrum-4, the size of the PGT table did not increase, but the maximum
number of LAG identifiers was doubled, leaving less room for others entries
(e.g., flood entries) that also reside in the PGT.

Therefore, in order to avoid a regression and as long as there is no
explicit requirement to support 256 LAGs, mlxsw driver will configure the
firmware to allocate the same amount of LAG entries (128) as in
Spectrum-{2,3}. This configuration is done using 'max_lag' field in
CONFIG_PROFILE command. Extend 'struct mlxsw_config_profile' to support
'max_lag' field and configure firmware accordingly.

A next patch will adjust Spectrum-4 to configure 'max_lag' field.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>