Marek Behún [Tue, 21 Nov 2023 17:20:24 +0000 (18:20 +0100)]
net: sfp: rework the RollBall PHY waiting code
RollBall SFP modules allow the access to PHY registers only after a
certain time has passed. Until then, the registers read 0xffff.
Currently we have quirks for modules where we need to wait either 25
seconds or 4 seconds, but recently I got hands on another module where
the wait is even shorter.
Instead of hardcoding different wait times, lets rework the code:
- increase the PHY retry count to 25
- when RollBall module is detected, increase the PHY retry time from
50ms to 1s
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 25 Nov 2023 02:09:19 +0000 (18:09 -0800)]
Merge branch 'firmware_loader'
Kory says:
====================
This patch was initially submitted as part of a net patch series.
Conor expressed interest in using it in a different subsystem.
https://lore.kernel.org/netdev/
20231116-feature_poe-v1-7-
be48044bf249@bootlin.com/
Consequently, I extracted it from the series and submitted it separately.
I first tried to send it to driver-core but it seems also not the best
choice:
https://lore.kernel.org/lkml/
2023111720-slicer-exes-7d9f@gregkh/
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent [Wed, 22 Nov 2023 13:52:43 +0000 (14:52 +0100)]
firmware_loader: Expand Firmware upload error codes with firmware invalid error
No error code are available to signal an invalid firmware content.
Drivers that can check the firmware content validity can not return this
specific failure to the user-space
Expand the firmware error code with an additional code:
- "firmware invalid" code which can be used when the provided firmware
is invalid
Sync lib/test_firmware.c file accordingly.
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231122-feature_firmware_error_code-v3-1-04ec753afb71@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Thu, 23 Nov 2023 09:53:26 +0000 (10:53 +0100)]
r8169: remove not needed check in rtl_fw_write_firmware
This check can never be true for a firmware file with a correct format.
Existing checks in rtl_fw_data_ok() are sufficient, no problems with
invalid firmware files are known.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 23 Nov 2023 03:10:50 +0000 (19:10 -0800)]
tools: ynl-gen: use enum name from the spec
The enum name used for id-to-str table does not handle
the enum-name override in the spec correctly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 23 Nov 2023 03:08:44 +0000 (19:08 -0800)]
tools: ynl-get: use family c-name
If a new family is ever added with a dash in the name
the C codegen will break. Make sure we use the "safe"
form of the name consistently.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhengchao Shao [Thu, 23 Nov 2023 01:55:15 +0000 (09:55 +0800)]
bonding: remove print in bond_verify_device_path
As suggested by Paolo in link[1], if the memory allocation fails, the mm
layer will emit a lot warning comprising the backtrace, so remove the
print.
[1] https://lore.kernel.org/all/
20231118081653.
1481260-1-shaozhengchao@huawei.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 23 Nov 2023 01:45:37 +0000 (09:45 +0800)]
net/smc: remove unneeded atomic operations in smc_tx_sndbuf_nonempty
The commit
dcd2cf5f2fc0 ("net/smc: add autocorking support") adds an
atomic variable tx_pushing in smc_connection to make sure only one can
send to let it cork more and save CDC slot. since smc_tx_pending can be
called in the soft IRQ without checking sock_owned_by_user() at that
time, which would cause a race condition because bh_lock_sock() did
not honor sock_lock()
After commit
6b88af839d20 ("net/smc: don't send in the BH context if
sock_owned_by_user"), the transmission is deferred to when sock_lock()
is held by the user. Therefore, we no longer need tx_pending to hold
message.
So remove atomic variable tx_pushing and its operation, and
smc_tx_sndbuf_nonempty becomes a wrapper of __smc_tx_sndbuf_nonempty,
so rename __smc_tx_sndbuf_nonempty back to smc_tx_sndbuf_nonempty
Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Co-developed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
diff v4: remove atomic variable tx_pushing
diff v3: improvements in the commit body and comments
diff v2: fix a typo in commit body and add net-next subject-prefix
net/smc/smc.h | 1 -
net/smc/smc_tx.c | 30 +-----------------------------
2 files changed, 1 insertion(+), 30 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 22 Nov 2023 17:33:23 +0000 (09:33 -0800)]
tools: ynl-gen: always append ULL/LL to range types
32bit builds generate the following warning when we use a u32-max
in range validation:
warning: decimal constant
4294967295 is between LONG_MAX and ULONG_MAX. For C99 that means long long, C90 compilers are very likely to produce unsigned long (and a warning) here
The range values are u64, slap ULL/LL on all of them just
to avoid such noise.
There's currently no code using full range validation, but
it will matter in the upcoming page-pool introspection.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Nov 2023 12:18:55 +0000 (12:18 +0000)]
Merge branch 'net-ipa-v5.5'
Alex Elder says:
====================
net: ipa: add IPA v5.5 support
This series adds IPA support for the Qualcomm SM8550 SoC, which uses
IPA v5.5.
The first patch adds a new compatible string for the SM8550. The
second cleans up "ipa_reg.h" a bit for consistency. The third patch
adds definitions and some minor code changes related to IPA v5.5.
The fourth defines IPA register offsets and fields used for IPA
v5.0; most--but not all--register definitions are the same as used
in IPA v5.0. The final patch adds configuration data used for IPA
v5.5 (here again this mostly duplicates IPA v5.0 definitions).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Wed, 22 Nov 2023 23:09:09 +0000 (17:09 -0600)]
net: ipa: add IPA v5.5 configuration data
Add the configuration data required for IPA v5.5, which is used in
the Qualcomm SM8550 SoC. With that, the driver supports IPA v5.5.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Wed, 22 Nov 2023 23:09:08 +0000 (17:09 -0600)]
net: ipa: add IPA v5.5 register definitions
GSI register definitions for IPA v5.5 are the same as those used for
IPA v5.0.
Update ipa_reg_id_valid() to reflect that IPA v5.0+ supports source
and destination resource groups 4 through 7.
Add the definitions of IPA register offsets and fields for IPA v5.5.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Wed, 22 Nov 2023 23:09:07 +0000 (17:09 -0600)]
net: ipa: prepare for IPA v5.5
For IPA v5.5+, the QTIME_TIMESTAMP_CFG register no longer defines
two fields in the DPL timestamp. Make the code referencing those
fields in ipa_qtime_config() conditional based on IPA version.
IPA v5.0+ supports the IPA_MEM_AP_V4_FILTER and IPA_MEM_AP_V6_FILTER
memory regions. Update ipa_mem_id_valid() to reflect that.
IPA v5.5 no longer supports a few register fields, adds some others,
and removes support for a few IPA interrupt types. Update
"ipa_reg.h" to include information about IPA v5.5.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Wed, 22 Nov 2023 23:09:06 +0000 (17:09 -0600)]
net: ipa: update IPA version comments in "ipa_reg.h"
Some definitions in "ipa_reg.h" are only valid for certain versions
of IPA. In such cases a comment indicates a version or range of
versions where the definition is (or is not) valid. Almost all such
cases look like "IPA vX.Y", but a few don't include the "IPA" tag.
Update these so they all consistently include "IPA". And replace
a few lines that talk about "the next bit" in the definition of the
ipa_irq_id enumerated type with a more concise comment using the
"IPA vX.Y" convention.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Wed, 22 Nov 2023 23:09:05 +0000 (17:09 -0600)]
dt-bindings: net: qcom,ipa: add SM8550 compatible
Add support for SM8550, which uses IPA v5.5.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Nov 2023 12:16:18 +0000 (12:16 +0000)]
Merge branch 'octeon_ep-max-rx'
Shinas Rasheed says:
====================
Get max rx packet length and solve
Patchsets which resolve observed style issues in control net
source files, and also implements get mtu control net api
to fetch max mtu value from firmware
Changes:
V2:
- Introduced a patch to resolve style issues as mentioned in V1
- Removed OCTEP_MAX_MTU macro, as it is redundant.
V1: https://lore.kernel.org/all/
20231121191224.
2489474-1-srasheed@marvell.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shinas Rasheed [Wed, 22 Nov 2023 18:34:35 +0000 (10:34 -0800)]
octeon_ep: get max rx packet length from firmware
Max receive packet length can vary across SoCs, so
this needs to be queried from respective firmware and
filled by driver. A control net get mtu api should be
implemented to do the same.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shinas Rasheed [Wed, 22 Nov 2023 18:34:34 +0000 (10:34 -0800)]
octeon_ep: Solve style issues in control net files
Solve observed style issues for kernel documentation
and structure declarations in control net API definitions.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Nov 2023 12:13:15 +0000 (12:13 +0000)]
Merge branch 'smc-sysctl'
Guangguan Wang says:
===================
add two sysctl for SMC-R v2.1
This patch set add two sysctl for SMC-R v2.1:
net.smc.smcr_max_links_per_lgr is used to control the max links
per lgr.
net.smc.smcr_max_conns_per_lgr is used to control the max connections
per lgr.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangguan Wang [Wed, 22 Nov 2023 13:52:58 +0000 (21:52 +0800)]
net/smc: add sysctl for max conns per lgr for SMC-R v2.1
Add a new sysctl: net.smc.smcr_max_conns_per_lgr, which is
used to control the preferred max connections per lgr for
SMC-R v2.1. The default value of this sysctl is 255, and
the acceptable value ranges from 16 to 255.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangguan Wang [Wed, 22 Nov 2023 13:52:57 +0000 (21:52 +0800)]
net/smc: add sysctl for max links per lgr for SMC-R v2.1
Add a new sysctl: net.smc.smcr_max_links_per_lgr, which is
used to control the preferred max links per lgr for SMC-R
v2.1. The default value of this sysctl is 2, and the acceptable
value ranges from 1 to 2.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Wed, 22 Nov 2023 11:41:42 +0000 (17:11 +0530)]
octeontx2-pf: TC flower offload support for ICMP type and code
Adds tc offload support for matching on ICMP type and code.
Example usage:
To enable adding tc ingress rules
tc qdisc add dev eth0 ingress
TC rule drop the ICMP echo reply:
tc filter add dev eth0 protocol ip parent ffff: \
flower ip_proto icmp type 8 code 0 skip_sw action drop
TC rule to drop ICMPv6 echo reply:
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 ip_proto icmpv6 type 128 code 0 action drop
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Marangi [Tue, 21 Nov 2023 13:53:32 +0000 (14:53 +0100)]
net: phy: correctly check soft_reset ret ONLY if defined for PHY
Introduced by commit
6e2d85ec0559 ("net: phy: Stop with excessive soft
reset").
soft_reset call for phy_init_hw had multiple revision across the years
and the implementation goes back to 2014. Originally was a simple call
to write the generic PHY reset BIT, it was then moved to a dedicated
function. It was then added the option for PHY driver to define their
own special way to reset the PHY. Till this change, checking for ret was
correct as it was always filled by either the generic reset or the
custom implementation. This changed tho with commit
6e2d85ec0559 ("net:
phy: Stop with excessive soft reset"), as the generic reset call to PHY
was dropped but the ret check was never made entirely optional and
dependent whether soft_reset was defined for the PHY driver or not.
Luckly nothing was ever added before the soft_reset call so the ret
check (in the case where a PHY didn't had soft_reset defined) although
wrong, never caused problems as ret was init 0 at the start of
phy_init_hw.
To prevent any kind of problem and to make the function cleaner and more
robust, correctly move the ret check if the soft_reset section making it
optional and needed only with the function defined.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Breno Leitao [Tue, 21 Nov 2023 11:48:31 +0000 (03:48 -0800)]
Documentation: Document each netlink family
This is a simple script that parses the Netlink YAML spec files
(Documentation/netlink/specs/), and generates RST files to be rendered
in the Network -> Netlink Specification documentation page.
Create a python script that is invoked during 'make htmldocs', reads the
YAML specs input file and generate the correspondent RST file.
Create a new Documentation/networking/netlink_spec index page, and
reference each Netlink RST file that was processed above in this main
index.rst file.
In case of any exception during the parsing, dump the error and skip
the file.
Do not regenerate the RST files if the input files (YAML) were not
changed in-between invocations.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
----
Changelog:
V3:
* Do not regenerate the RST files if the input files were not
changed. In order to do it, a few things changed:
- Rely on Makefile more to find what changed, and trigger
individual file processing
- The script parses file by file now (instead of batches)
- Create a new option to generate the index file
V2:
* Moved the logic from a sphinx extension to a external script
* Adjust some formatting as suggested by Donald Hunter and Jakub
* Auto generating all the rsts instead of having stubs
* Handling error gracefully
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 23 Nov 2023 20:19:49 +0000 (12:19 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/intel/ice/ice_main.c
c9663f79cd82 ("ice: adjust switchdev rebuild path")
7758017911a4 ("ice: restore timestamp configuration after device reset")
https://lore.kernel.org/all/
20231121211259.
3348630-1-anthony.l.nguyen@intel.com/
Adjacent changes:
kernel/bpf/verifier.c
bb124da69c47 ("bpf: keep track of max number of bpf_loop callback iterations")
5f99f312bd3b ("bpf: add register bounds sanity checks and sanitization")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 23 Nov 2023 18:40:13 +0000 (10:40 -0800)]
Merge tag 'net-6.7-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf.
Current release - regressions:
- Revert "net: r8169: Disable multicast filter for RTL8168H and
RTL8107E"
- kselftest: rtnetlink: fix ip route command typo
Current release - new code bugs:
- s390/ism: make sure ism driver implies smc protocol in kconfig
- two build fixes for tools/net
Previous releases - regressions:
- rxrpc: couple of ACK/PING/RTT handling fixes
Previous releases - always broken:
- bpf: verify bpf_loop() callbacks as if they are called unknown
number of times
- improve stability of auto-bonding with Hyper-V
- account BPF-neigh-redirected traffic in interface statistics
Misc:
- net: fill in some more MODULE_DESCRIPTION()s"
* tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
tools: ynl: fix duplicate op name in devlink
tools: ynl: fix header path for nfsd
net: ipa: fix one GSI register field width
tls: fix NULL deref on tls_sw_splice_eof() with empty record
net: axienet: Fix check for partial TX checksum
vsock/test: fix SEQPACKET message bounds test
i40e: Fix adding unsupported cloud filters
ice: restore timestamp configuration after device reset
ice: unify logic for programming PFINT_TSYN_MSK
ice: remove ptp_tx ring parameter flag
amd-xgbe: propagate the correct speed and duplex status
amd-xgbe: handle the corner-case during tx completion
amd-xgbe: handle corner-case during sfp hotplug
net: veth: fix ethtool stats reporting
octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF
net: usb: qmi_wwan: claim interface 4 for ZTE MF290
Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"
net/smc: avoid data corruption caused by decline
nfc: virtual_ncidev: Add variable to check if ndev is running
dpll: Fix potential msg memleak when genlmsg_put_reply failed
...
Jakub Kicinski [Thu, 23 Nov 2023 03:05:58 +0000 (19:05 -0800)]
tools: ynl: fix duplicate op name in devlink
We don't support CRUD-inspired message types in YNL too well.
One aspect that currently trips us up is the fact that single
message ID can be used in multiple commands (as the response).
This leads to duplicate entries in the id-to-string tables:
devlink-user.c:19:34: warning: initialized field overwritten [-Woverride-init]
19 | [DEVLINK_CMD_PORT_NEW] = "port-new",
| ^~~~~~~~~~
devlink-user.c:19:34: note: (near initialization for ‘devlink_op_strmap[7]’)
Fixes tag points at where the code was generated, the "real" problem
is that the code generator does not support CRUD.
Fixes: f2f9dd164db0 ("netlink: specs: devlink: add the remaining command to generate complete split_ops")
Link: https://lore.kernel.org/r/20231123030558.1611831-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 23 Nov 2023 03:06:24 +0000 (19:06 -0800)]
tools: ynl: fix header path for nfsd
The makefile dependency is trying to include the wrong header:
<command-line>: fatal error: ../../../../include/uapi//linux/nfsd.h: No such file or directory
The guard also looks wrong.
Fixes: f14122b2c2ac ("tools: ynl: Add source files for nfsd netlink protocol")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20231123030624.1611925-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 22 Nov 2023 23:17:08 +0000 (17:17 -0600)]
net: ipa: fix one GSI register field width
The width of the R_LENGTH field of the EV_CH_E_CNTXT_1 GSI register
is 24 bits (not 20 bits) starting with IPA v5.0. Fix this.
Fixes: faf0678ec8a0 ("net: ipa: add IPA v5.0 GSI register definitions")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20231122231708.896632-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jann Horn [Wed, 22 Nov 2023 21:44:47 +0000 (22:44 +0100)]
tls: fix NULL deref on tls_sw_splice_eof() with empty record
syzkaller discovered that if tls_sw_splice_eof() is executed as part of
sendfile() when the plaintext/ciphertext sk_msg are empty, the send path
gets confused because the empty ciphertext buffer does not have enough
space for the encryption overhead. This causes tls_push_record() to go on
the `split = true` path (which is only supposed to be used when interacting
with an attached BPF program), and then get further confused and hit the
tls_merge_open_record() path, which then assumes that there must be at
least one populated buffer element, leading to a NULL deref.
It is possible to have empty plaintext/ciphertext buffers if we previously
bailed from tls_sw_sendmsg_locked() via the tls_trim_both_msgs() path.
tls_sw_push_pending_record() already handles this case correctly; let's do
the same check in tls_sw_splice_eof().
Fixes: df720d288dbb ("tls/sw: Use splice_eof() to flush")
Cc: stable@vger.kernel.org
Reported-by: syzbot+40d43509a099ea756317@syzkaller.appspotmail.com
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20231122214447.675768-1-jannh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Samuel Holland [Wed, 22 Nov 2023 00:42:17 +0000 (16:42 -0800)]
net: axienet: Fix check for partial TX checksum
Due to a typo, the code checked the RX checksum feature in the TX path.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/20231122004219.3504219-1-samuel.holland@sifive.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arseniy Krasnov [Tue, 21 Nov 2023 21:16:42 +0000 (00:16 +0300)]
vsock/test: fix SEQPACKET message bounds test
Tune message length calculation to make this test work on machines
where 'getpagesize()' returns >32KB. Now maximum message length is not
hardcoded (on machines above it was smaller than 'getpagesize()' return
value, thus we get negative value and test fails), but calculated at
runtime and always bigger than 'getpagesize()' result. Reproduced on
aarch64 with 64KB page size.
Fixes: 5c338112e48a ("test/vsock: rework message bounds test")
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reported-by: Bogdan Marcynkov <bmarcynk@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20231121211642.163474-1-avkrasnov@salutedevices.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Tue, 21 Nov 2023 21:13:36 +0000 (13:13 -0800)]
i40e: Fix adding unsupported cloud filters
If a VF tries to add unsupported cloud filter through virtchnl
then i40e_add_del_cloud_filter(_big_buf) returns -ENOTSUPP but
this error code is stored in 'ret' instead of 'aq_ret' that
is used as error code sent back to VF. In this scenario where
one of the mentioned functions fails the value of 'aq_ret'
is zero so the VF will incorrectly receive a 'success'.
Use 'aq_ret' to store return value and remove 'ret' local
variable. Additionally fix the issue when filter allocation
fails, in this case no notification is sent back to the VF.
Fixes: e284fc280473 ("i40e: Add and delete cloud filter")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20231121211338.3348677-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Thu, 23 Nov 2023 14:27:35 +0000 (15:27 +0100)]
Merge branch 'ice-restore-timestamp-config-after-reset'
Tony Nguyen says:
====================
ice: restore timestamp config after reset
Jake Keller says:
We recently discovered during internal validation that the ice driver has
not been properly restoring Tx timestamp configuration after a device reset,
which resulted in application failures after a device reset.
After some digging, it turned out this problem is two-fold. Since the
introduction of the PTP support the driver has been clobbering the storage
of the current timestamp configuration during reset. Thus after a reset, the
driver will no longer perform Tx or Rx timestamps, and will report
timestamp configuration as disabled if SIOCGHWTSTAMP ioctl is issued.
In addition, the recently merged auxiliary bus support code missed that
PFINT_TSYN_MSK must be reprogrammed on the clock owner for E822 devices.
Failure to restore this register configuration results in the driver no
longer responding to interrupts from other ports. Depending on the traffic
pattern, this can either result in increased latency responding to
timestamps on the non-owner ports, or it can result in the driver never
reporting any timestamps. The configuration of PFINT_TSYN_MSK was only done
during initialization. Due to this, the Tx timestamp issue persists even if
userspace reconfigures timestamping.
This series fixes both issues, as well as removes a redundant Tx ring field
since we can rely on the skb flag as the primary detector for a Tx timestamp
request.
Note that I don't think this series will directly apply to older stable
releases (even v6.6) as we recently refactored a lot of the PTP code to
support auxiliary bus. Patch 2/3 only matters for the post-auxiliary bus
implementation. The principle of patch 1/3 and 3/3 could apply as far back
as the initial PTP support, but I don't think it will apply cleanly as-is.
====================
Link: https://lore.kernel.org/r/20231121211259.3348630-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jacob Keller [Tue, 21 Nov 2023 21:12:57 +0000 (13:12 -0800)]
ice: restore timestamp configuration after device reset
The driver calls ice_ptp_cfg_timestamp() during ice_ptp_prepare_for_reset()
to disable timestamping while the device is resetting. This operation
destroys the user requested configuration. While the driver does call
ice_ptp_cfg_timestamp in ice_rebuild() to restore some hardware settings
after a reset, it unconditionally passes true or false, resulting in
failure to restore previous user space configuration.
This results in a device reset forcibly disabling timestamp configuration
regardless of current user settings.
This was not detected previously due to a quirk of the LinuxPTP ptp4l
application. If ptp4l detects a missing timestamp, it enters a fault state
and performs recovery logic which includes executing SIOCSHWTSTAMP again,
restoring the now accidentally cleared configuration.
Not every application does this, and for these applications, timestamps
will mysteriously stop after a PF reset, without being restored until an
application restart.
Fix this by replacing ice_ptp_cfg_timestamp() with two new functions:
1) ice_ptp_disable_timestamp_mode() which unconditionally disables the
timestamping logic in ice_ptp_prepare_for_reset() and ice_ptp_release()
2) ice_ptp_restore_timestamp_mode() which calls
ice_ptp_restore_tx_interrupt() to restore Tx timestamping configuration,
calls ice_set_rx_tstamp() to restore Rx timestamping configuration, and
issues an immediate TSYN_TX interrupt to ensure that timestamps which
may have occurred during the device reset get processed.
Modify the ice_ptp_set_timestamp_mode to directly save the user
configuration and then call ice_ptp_restore_timestamp_mode. This way, reset
no longer destroys the saved user configuration.
This obsoletes the ice_set_tx_tstamp() function which can now be safely
removed.
With this change, all devices should now restore Tx and Rx timestamping
functionality correctly after a PF reset without application intervention.
Fixes: 77a781155a65 ("ice: enable receive hardware timestamping")
Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jacob Keller [Tue, 21 Nov 2023 21:12:56 +0000 (13:12 -0800)]
ice: unify logic for programming PFINT_TSYN_MSK
Commit
d938a8cca88a ("ice: Auxbus devices & driver for E822 TS") modified
how Tx timestamps are handled for E822 devices. On these devices, only the
clock owner handles reading the Tx timestamp data from firmware. To do
this, the PFINT_TSYN_MSK register is modified from the default value to one
which enables reacting to a Tx timestamp on all PHY ports.
The driver currently programs PFINT_TSYN_MSK in different places depending
on whether the port is the clock owner or not. For the clock owner, the
PFINT_TSYN_MSK value is programmed during ice_ptp_init_owner just before
calling ice_ptp_tx_ena_intr to program the PHY ports.
For the non-clock owner ports, the PFINT_TSYN_MSK is programmed during
ice_ptp_init_port.
If a large enough device reset occurs, the PFINT_TSYN_MSK register will be
reset to the default value in which only the PHY associated directly with
the PF will cause the Tx timestamp interrupt to trigger.
The driver lacks logic to reprogram the PFINT_TSYN_MSK register after a
device reset. For the E822 device, this results in the PF no longer
responding to interrupts for other ports. This results in failure to
deliver Tx timestamps to user space applications.
Rename ice_ptp_configure_tx_tstamp to ice_ptp_cfg_tx_interrupt, and unify
the logic for programming PFINT_TSYN_MSK and PFINT_OICR_ENA into one place.
This function will program both registers according to the combination of
user configuration and device requirements.
This ensures that PFINT_TSYN_MSK is always restored when we configure the
Tx timestamp interrupt.
Fixes: d938a8cca88a ("ice: Auxbus devices & driver for E822 TS")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jacob Keller [Tue, 21 Nov 2023 21:12:55 +0000 (13:12 -0800)]
ice: remove ptp_tx ring parameter flag
Before performing a Tx timestamp in ice_stamp(), the driver checks a ptp_tx
ring variable to see if timestamping is enabled on that ring. This value is
set for all rings whenever userspace configures Tx timestamping.
Ostensibly this was done to avoid wasting cycles checking other fields when
timestamping has not been enabled. However, for Tx timestamps we already
get an individual per-SKB flag indicating whether userspace wants to
request a timestamp on that packet. We do not gain much by also having
a separate flag to check for whether timestamping was enabled.
In fact, the driver currently fails to restore the field after a PF reset.
Because of this, if a PF reset occurs, timestamps will be disabled.
Since this flag doesn't add value in the hotpath, remove it and always
provide a timestamp if the SKB flag has been set.
A following change will fix the reset path to properly restore user
timestamping configuration completely.
This went unnoticed for some time because one of the most common
applications using Tx timestamps, ptp4l, will reconfigure the socket as
part of its fault recovery logic.
Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 23 Nov 2023 12:47:25 +0000 (13:47 +0100)]
Merge branch 'amd-xgbe-fixes-to-handle-corner-cases'
Raju Rangoju says:
====================
amd-xgbe: fixes to handle corner-cases
This series include bug fixes to amd-xgbe driver.
====================
Link: https://lore.kernel.org/r/20231121191435.4049995-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju [Tue, 21 Nov 2023 19:14:35 +0000 (00:44 +0530)]
amd-xgbe: propagate the correct speed and duplex status
xgbe_get_link_ksettings() does not propagate correct speed and duplex
information to ethtool during cable unplug. Due to which ethtool reports
incorrect values for speed and duplex.
Address this by propagating correct information.
Fixes: 7c12aa08779c ("amd-xgbe: Move the PHY support into amd-xgbe")
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju [Tue, 21 Nov 2023 19:14:34 +0000 (00:44 +0530)]
amd-xgbe: handle the corner-case during tx completion
The existing implementation uses software logic to accumulate tx
completions until the specified time (1ms) is met and then poll them.
However, there exists a tiny gap which leads to a race between
resetting and checking the tx_activate flag. Due to this the tx
completions are not reported to upper layer and tx queue timeout
kicks-in restarting the device.
To address this, introduce a tx cleanup mechanism as part of the
periodic maintenance process.
Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver")
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Rangoju [Tue, 21 Nov 2023 19:14:33 +0000 (00:44 +0530)]
amd-xgbe: handle corner-case during sfp hotplug
Force the mode change for SFI in Fixed PHY configurations. Fixed PHY
configurations needs PLL to be enabled while doing mode set. When the
SFP module isn't connected during boot, driver assumes AN is ON and
attempts auto-negotiation. However, if the connected SFP comes up in
Fixed PHY configuration the link will not come up as PLL isn't enabled
while the initial mode set command is issued. So, force the mode change
for SFI in Fixed PHY configuration to fix link issues.
Fixes: e57f7a3feaef ("amd-xgbe: Prepare for working with more than one type of phy")
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Tue, 21 Nov 2023 19:08:44 +0000 (20:08 +0100)]
net: veth: fix ethtool stats reporting
Fix a possible misalignment between page_pool stats and tx xdp_stats
reported in veth_get_ethtool_stats routine.
The issue can be reproduced configuring the veth pair with the
following tx/rx queues:
$ip link add v0 numtxqueues 2 numrxqueues 4 type veth peer name v1 \
numtxqueues 1 numrxqueues 1
and loading a simple XDP program on v0 that just returns XDP_PASS.
In this case on v0 the page_pool stats overwrites tx xdp_stats for queue 1.
Fix the issue incrementing pp_idx of dev->real_num_tx_queues * VETH_TQ_STATS_LEN
since we always report xdp_stats for all tx queues in ethtool.
Fixes: 4fc418053ec7 ("net: veth: add page_pool stats")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c5b5d0485016836448453f12846c7c4ab75b094a.1700593593.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Niklas Söderlund [Tue, 21 Nov 2023 18:37:38 +0000 (19:37 +0100)]
dt-bindings: net: renesas,ethertsn: Add Ethernet TSN
Add bindings for Renesas R-Car Ethernet TSN End-station IP. The RTSN
device provides Ethernet network.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20231121183738.656192-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Tue, 21 Nov 2023 16:56:24 +0000 (22:26 +0530)]
octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF
It is possible to add a ntuple rule which would like to direct packet to
a VF whose number of queues are greater/less than its PF's queue numbers.
For example a PF can have 2 Rx queues but a VF created on that PF can have
8 Rx queues. As of today, ntuple rule will reject rule because it is
checking the requested queue number against PF's number of Rx queues.
As a part of this fix if the action of a ntuple rule is to move a packet
to a VF's queue then the check is removed. Also, a debug information is
printed to aware user that it is user's responsibility to cross check if
the requested queue number on that VF is a valid one.
Fixes: f0a1913f8a6f ("octeontx2-pf: Add support for ethtool ntuple filters")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231121165624.3664182-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 23 Nov 2023 11:02:52 +0000 (12:02 +0100)]
Merge branch 'net-ethernet-renesas-rcar_gen4_ptp-add-v4h-support'
Niklas Söderlund says:
====================
net: ethernet: renesas: rcar_gen4_ptp: Add V4H support
This small series prepares the rcar_gen4_ptp to be useable both on both
R-Car S4 and V4H. The only in-tree driver that make use of this is
rswtich on S4. A new Ethernet (R-Car Ethernet TSN) driver for V4H is on
it's way that also will make use of rcar_gen4_ptp functionality.
Patch 1-2 are small improvements to the existing driver. While patch 3-4
adds V4H support. Finally patch 5 turns rcar_gen4_ptp into a separate
module to allow the gPTP functionality to be shared between the two
users without having to duplicate the code in each.
See each patch for changelog.
====================
Link: https://lore.kernel.org/r/20231121155306.515446-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Niklas Söderlund [Tue, 21 Nov 2023 15:53:06 +0000 (16:53 +0100)]
net: ethernet: renesas: rcar_gen4_ptp: Break out to module
The Gen4 gPTP support will be shared between the existing Renesas
Ethernet Switch driver and the upcoming Renesas Ethernet-TSN driver. In
preparation for this break out the gPTP support to its own module.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Niklas Söderlund [Tue, 21 Nov 2023 15:53:05 +0000 (16:53 +0100)]
net: ethernet: renesas: rcar_gen4_ptp: Get clock increment from clock rate
Instead of using hard coded clock increment values for each SoC derive
the clock increment from the module clock. This is done in preparation
to support a second platform, R-Car V4H that uses a 200Mhz clock
compared with the 320Mhz clock used on R-Car S4.
Tested on both SoCs,
S4 reports a clock of 320000000Hz which gives a value of 0x19000000.
Documentation says a 320Mhz clock is used and the correct increment for
that clock is 0x19000000.
V4H reports a clock of 199999992Hz which gives a value of 0x2800001a.
Documentation says a 200Mhz clock is used and the correct increment for
that clock is 0x28000000.
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Niklas Söderlund [Tue, 21 Nov 2023 15:53:04 +0000 (16:53 +0100)]
net: ethernet: renesas: rcar_gen4_ptp: Prepare for shared register layout
All known R-Car Gen4 SoC share the same register layout, rename the
R-Car S4 specific identifiers so they can be shared with the upcoming
R-Car V4H support.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Niklas Söderlund [Tue, 21 Nov 2023 15:53:03 +0000 (16:53 +0100)]
net: ethernet: renesas: rcar_gen4_ptp: Fail on unknown register layout
Instead of printing a warning and proceeding with an unknown register
layout return an error. The only call site is already prepared to
propagate the error.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Niklas Söderlund [Tue, 21 Nov 2023 15:53:02 +0000 (16:53 +0100)]
net: ethernet: renesas: rcar_gen4_ptp: Remove incorrect comment
The comments intent was to indicates which function uses the enum. While
upstreaming rcar_gen4_ptp the function was renamed but this comment was
left with the old function name.
Instead of correcting the comment remove it, it adds little value.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lech Perczak [Fri, 17 Nov 2023 23:19:18 +0000 (00:19 +0100)]
net: usb: qmi_wwan: claim interface 4 for ZTE MF290
Interface 4 is used by for QMI interface in stock firmware of MF28D, the
router which uses MF290 modem. Rebind it to qmi_wwan after freeing it up
from option driver.
The proper configuration is:
Interface mapping is:
0: QCDM, 1: (unknown), 2: AT (PCUI), 2: AT (Modem), 4: QMI
T: Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 4 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=19d2 ProdID=0189 Rev= 0.00
S: Manufacturer=ZTE, Incorporated
S: Product=ZTE LTE Technologies MSM
C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E: Ad=86(I) Atr=03(Int.) MxPS= 64 Ivl=2ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Link: https://lore.kernel.org/r/20231117231918.100278-3-lech.perczak@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Wed, 22 Nov 2023 18:20:17 +0000 (10:20 -0800)]
Merge tag 'loongarch-fixes-6.7-1' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix several build errors, a potential kernel panic, a cpu hotplug
issue and update links in documentations"
* tag 'loongarch-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
Docs/zh_CN/LoongArch: Update links in LoongArch introduction.rst
Docs/LoongArch: Update links in LoongArch introduction.rst
LoongArch: Implement constant timer shutdown interface
LoongArch: Mark {dmw,tlb}_virt_to_page() exports as non-GPL
LoongArch: Silence the boot warning about 'nokaslr'
LoongArch: Add __percpu annotation for __percpu_read()/__percpu_write()
LoongArch: Record pc instead of offset in la_abs relocation
LoongArch: Explicitly set -fdirect-access-external-data for vmlinux
LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi
Linus Torvalds [Wed, 22 Nov 2023 17:56:26 +0000 (09:56 -0800)]
Merge tag 'hyperv-fixes-signed-
20231121' of git://git./linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- One fix for the KVP daemon (Ani Sinha)
- Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar)
- Micro-optimization for hv_nmi_unknown() (Uros Bizjak)
* tag 'hyperv-fixes-signed-
20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
Linus Torvalds [Fri, 10 Nov 2023 06:22:13 +0000 (22:22 -0800)]
asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation
We really don't want to do atomic_read() or anything like that, since we
already have the value, not the lock. The whole point of this is that
we've loaded the lock from memory, and we want to check whether the
value we loaded was a locked one or not.
The main use of this is the lockref code, which loads both the lock and
the reference count in one atomic operation, and then works on that
combined value. With the atomic_read(), the compiler would pointlessly
spill the value to the stack, in order to then be able to read it back
"atomically".
This is the qspinlock version of commit
c6f4a9002252 ("asm-generic:
ticket-lock: Optimize arch_spin_value_unlocked()") which fixed this same
bug for ticket locks.
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/all/CAHk-=whNRv0v6kQiV5QO6DJhjH4KEL36vWQ6Re8Csrnh4zbRkQ@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiner Kallweit [Tue, 21 Nov 2023 08:09:33 +0000 (09:09 +0100)]
Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"
This reverts commit
efa5f1311c4998e9e6317c52bc5ee93b3a0f36df.
I couldn't reproduce the reported issue. What I did, based on a pcap
packet log provided by the reporter:
- Used same chip version (RTL8168h)
- Set MAC address to the one used on the reporters system
- Replayed the EAPOL unicast packet that, according to the reporter,
was filtered out by the mc filter.
The packet was properly received.
Therefore the root cause of the reported issue seems to be somewhere
else. Disabling mc filtering completely for the most common chip
version is a quite big hammer. Therefore revert the change and wait
for further analysis results from the reporter.
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Wed, 22 Nov 2023 02:37:05 +0000 (10:37 +0800)]
net/smc: avoid data corruption caused by decline
We found a data corruption issue during testing of SMC-R on Redis
applications.
The benchmark has a low probability of reporting a strange error as
shown below.
"Error: Protocol error, got "\xe2" as reply type byte"
Finally, we found that the retrieved error data was as follows:
0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
It is quite obvious that this is a SMC DECLINE message, which means that
the applications received SMC protocol message.
We found that this was caused by the following situations:
client server
¦ clc proposal
------------->
¦ clc accept
<-------------
¦ clc confirm
------------->
wait llc confirm
send llc confirm
¦failed llc confirm
¦ x------
(after 2s)timeout
wait llc confirm rsp
wait decline
(after 1s) timeout
(after 2s) timeout
¦ decline
-------------->
¦ decline
<--------------
As a result, a decline message was sent in the implementation, and this
message was read from TCP by the already-fallback connection.
This patch double the client timeout as 2x of the server value,
With this simple change, the Decline messages should never cross or
collide (during Confirm link timeout).
This issue requires an immediate solution, since the protocol updates
involve a more long-term solution.
Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nguyen Dinh Phi [Tue, 21 Nov 2023 07:53:57 +0000 (15:53 +0800)]
nfc: virtual_ncidev: Add variable to check if ndev is running
syzbot reported an memory leak that happens when an skb is add to
send_buff after virtual nci closed.
This patch adds a variable to track if the ndev is running before
handling new skb in send function.
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
Reviewed-by: Bongsu Jeon
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gan, Yi Fang [Tue, 21 Nov 2023 05:38:42 +0000 (13:38 +0800)]
net: stmmac: Add support for HW-accelerated VLAN stripping
Current implementation supports driver level VLAN tag stripping only.
The features is always on if CONFIG_VLAN_8021Q is enabled in kernel
config and is not user configurable.
This patch add support to MAC level VLAN tag stripping and can be
configured through ethtool. If the rx-vlan-offload is off, the VLAN tag
will be stripped by driver. If the rx-vlan-offload is on, the VLAN tag
will be stripped by MAC.
Command: ethtool -K <interface> rx-vlan-offload off | on
Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
Signed-off-by: Gan, Yi Fang <yi.fang.gan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Murali Karicheri [Tue, 21 Nov 2023 05:37:53 +0000 (11:07 +0530)]
net: hsr: Add support for MC filtering at the slave device
When MC (multicast) list is updated by the networking layer due to a
user command and as well as when allmulti flag is set, it needs to be
passed to the enslaved Ethernet devices. This patch allows this
to happen by implementing ndo_change_rx_flags() and ndo_set_rx_mode()
API calls that in turns pass it to the slave devices using
existing API calls.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uros Bizjak [Tue, 14 Nov 2023 16:59:28 +0000 (17:59 +0100)]
x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in
the ZF flag, so this change saves a compare after CMPXCHG. The generated
asm code improves from:
3e: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx
45: b8 ff ff ff ff mov $0xffffffff,%eax
4a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip)
51: 00
52: 83 f8 ff cmp $0xffffffff,%eax
55: 0f 95 c0 setne %al
to:
3e: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx
45: b8 ff ff ff ff mov $0xffffffff,%eax
4a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip)
51: 00
52: 0f 95 c0 setne %al
No functional change intended.
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20231114170038.381634-1-ubizjak@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <
20231114170038.381634-1-ubizjak@gmail.com>
Jakub Kicinski [Wed, 22 Nov 2023 01:52:49 +0000 (17:52 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-11-21
We've added 85 non-merge commits during the last 12 day(s) which contain
a total of 63 files changed, 4464 insertions(+), 1484 deletions(-).
The main changes are:
1) Huge batch of verifier changes to improve BPF register bounds logic
and range support along with a large test suite, and verifier log
improvements, all from Andrii Nakryiko.
2) Add a new kfunc which acquires the associated cgroup of a task within
a specific cgroup v1 hierarchy where the latter is identified by its id,
from Yafang Shao.
3) Extend verifier to allow bpf_refcount_acquire() of a map value field
obtained via direct load which is a use-case needed in sched_ext,
from Dave Marchevsky.
4) Fix bpf_get_task_stack() helper to add the correct crosstask check
for the get_perf_callchain(), from Jordan Rome.
5) Fix BPF task_iter internals where lockless usage of next_thread()
was wrong. The rework also simplifies the code, from Oleg Nesterov.
6) Fix uninitialized tail padding via LIBBPF_OPTS_RESET, and another
fix for certain BPF UAPI structs to fix verifier failures seen
in bpf_dynptr usage, from Yonghong Song.
7) Add BPF selftest fixes for map_percpu_stats flakes due to per-CPU BPF
memory allocator not being able to allocate per-CPU pointer successfully,
from Hou Tao.
8) Add prep work around dynptr and string handling for kfuncs which
is later going to be used by file verification via BPF LSM and fsverity,
from Song Liu.
9) Improve BPF selftests to update multiple prog_tests to use ASSERT_*
macros, from Yuran Pereira.
10) Optimize LPM trie lookup to check prefixlen before walking the trie,
from Florian Lehner.
11) Consolidate virtio/9p configs from BPF selftests in config.vm file
given they are needed consistently across archs, from Manu Bretelle.
12) Small BPF verifier refactor to remove register_is_const(),
from Shung-Hsi Yu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinux
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bind_perm
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
selftests/bpf: reduce verboseness of reg_bounds selftest logs
bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)
bpf: bpf_iter_task_next: use __next_thread() rather than next_thread()
bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()
bpf: emit frameno for PTR_TO_STACK regs if it differs from current one
bpf: smarter verifier log number printing logic
bpf: omit default off=0 and imm=0 in register state log
bpf: emit map name in register state if applicable and available
bpf: print spilled register state in stack slot
bpf: extract register state printing
bpf: move verifier state printing code to kernel/bpf/log.c
bpf: move verbose_linfo() into kernel/bpf/log.c
bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS
bpf: Remove test for MOVSX32 with offset=32
selftests/bpf: add iter test requiring range x range logic
veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag
...
====================
Link: https://lore.kernel.org/r/20231122000500.28126-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Ge [Tue, 21 Nov 2023 01:37:09 +0000 (09:37 +0800)]
dpll: Fix potential msg memleak when genlmsg_put_reply failed
We should clean the skb resource if genlmsg_put_reply failed.
Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20231121013709.73323-1-gehao@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 22 Nov 2023 01:32:51 +0000 (17:32 -0800)]
Merge branch 'bnxt_en-prepare-to-support-new-p7-chips'
Michael Chan says:
====================
bnxt_en: Prepare to support new P7 chips
This patchset is to prepare the driver to support the new P7 chips by
refactoring and modifying the code. The P7 chip is built on the P5
chip and many code paths can be modified to support both chips. The
whole patchset to have basic support for P7 chips is about 20 patches so
a follow-on patchset will complete the support and add the new PCI IDs.
The first 8 patches are changes to the backing store logic to support
both chips with mostly common code paths and datastructures. Both
chips require host backing store memory but the relevant firmware APIs
have been modified to make it easier to support new backing store
memory types.
The next 4 patches are changes to TX and RX ring indexing logic and NAPI
logic. The main changes are to increment the TX and RX producers
unbounded and to do any masking only when needed. These changes are
needed to support the P7 chips which require additional higher bits in
these producer indices. The NAPI logic is also slightly modifed.
The last patch is a rename of BNXT_FLAG_CHIP_P5 to BNXT_FLAG_P5_PLUS and
other related macro changes to make it clear that the P5_PLUS macro
applies to P5 and newer chips.
====================
Link: https://lore.kernel.org/r/20231120234405.194542-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Schacher [Mon, 20 Nov 2023 23:44:05 +0000 (15:44 -0800)]
bnxt_en: Rename some macros for the P5 chips
In preparation to support a new P7 chip which has a lot of similarities
with the P5 chip, rename the BNXT_FLAG_CHIP_P5 flag to
BNXT_FLAG_CHIP_P5_PLUS. This will make it clear that the flag is for
P5 and newer chips.
Also, since there are no additional P5 variants in production, rename
BNXT_FLAG_CHIP_P5_THOR() to BNXT_FLAG_CHIP_P5() to keep the naming
more simple.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Randy Schacher <stuart.schacher@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-14-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:44:04 +0000 (15:44 -0800)]
bnxt_en: Modify the NAPI logic for the new P7 chips
Modify the NAPI logic for the new doorbell mechanism on P7 chips.
These changes are compatible with the current P5 chips.
In the current logic, bnxt_poll_p5() services 1 or more CQs for each
MSIX. Each MSIX has an associated NQ and each NQ has 1 or more
associated CQs. If any CQ reaches NAPI budget, we'll stay in polling
mode and will unconditionally check and service all CQs until we exit
polling. We always re-arm all CQs when we exit polling.
To be compatible with the new Toggle bit mechanism in P7 chips, we need
to modify the logic so that we service and re-arm the CQ only if we
receive an NQE notification for work for that CQ. We add a new
had_nqe_notify bit to the cp_ring_info structure and it gets set when we
see the NQE notification for that CQ anytime during polling. We'll
service and re-arm only the CQs with the had_nqe_notify bits set.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-13-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:44:03 +0000 (15:44 -0800)]
bnxt_en: Modify RX ring indexing logic.
Modify the RX indexing logic for both RX ring and RX aggregation ring just
like the TX logic. Change it so that the index increments unbounded and
mask it only when needed.
Modify the existing RX macros so that the index is not masked. Add new
macros RING_RX()/RING_RX_AGG() to mask it only when needed to get the
index of rxr->rx_buf_ring[] and rxr->rx_agg_ring[].
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-12-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:44:02 +0000 (15:44 -0800)]
bnxt_en: Modify TX ring indexing logic.
Change the TX ring logic so that the index increments unbounded and
mask it only when needed.
Modify the existing macros so that the index is not masked. Add a
new macro RING_TX() to mask it only when needed to get the index of
txr->tx_buf_ring[].
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-11-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:44:01 +0000 (15:44 -0800)]
bnxt_en: Add db_ring_mask and related macro to bnxt_db_info struct.
This allows the doorbell related logic to mask the doorbell index
to the proper range before writing the doorbell.
The current code masks the doorbell index immediately to keep it in the
legal ranges for the most part. Subsequent patches will change the
logic so that the index increments unbounded and it only gets masked
before use. This is preparation work for the new chip that requires an
additional Epoch bit in the doorbell that needs to toggle when the index
has wrapped around.
This patch just adds the basic infrastructure and the logic is largely
unchanged. We now replace RING_CMP() with the new DB_RING_IDX() at
appropriate places where we mask the completion ring index before
writing the doorbell.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:44:00 +0000 (15:44 -0800)]
bnxt_en: Add support for HWRM_FUNC_BACKING_STORE_CFG_V2 firmware calls
Newer chips starting with 57600 will use this new firmware HWRM call to
configure backing store memory. Add this new call if it is supported
by the firmware.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:43:59 +0000 (15:43 -0800)]
bnxt_en: Add support for new backing store query firmware API
Use the new v2 firmware API if supported by the firmware. We now have the
infrastructure to support the v2 API.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:43:58 +0000 (15:43 -0800)]
bnxt_en: Add bnxt_setup_ctxm_pg_tbls() helper function
In bnxt_alloc_ctx_mem(), the logic to set up the context memory entries
and to allocate the context memory tables is done repetitively. Add
a helper function to simplify the code.
The setup of the Fast Path TQM entries relies on some information from
the Slow Path TQM entries. Copy the SP_TQM entries to the FP_TQM
entries to simplify the logic.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:43:57 +0000 (15:43 -0800)]
bnxt_en: Use the pg_info field in bnxt_ctx_mem_type struct
Use the newly added pg_info field in bnxt_ctx_mem_type struct and
remove the standalone page info structures in bnxt_ctx_mem_info.
This now completes the reorganization of the context memory
structures to work better with the new and more flexible firmware
interface for newer chips.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:43:56 +0000 (15:43 -0800)]
bnxt_en: Add page info to struct bnxt_ctx_mem_type
This will further improve the organization of the bnxt_ctx_mem_info
structure by moving the standalone page info structures into the
bnxt_ctx_mem_type array. Add the allocation and free logic first and
the next patch will migrate to use the new infrastructure.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:43:55 +0000 (15:43 -0800)]
bnxt_en: Restructure context memory data structures
The current code uses a flat bnxt_ctx_mem_info structure to store 8
types of context memory for the NIC. All the context memory types
are very similar and have similar parameters. They can all share a
common structure to improve the organization. Also, new firmware
interface will provide a new API to retrieve each type of context
memory by calling the API repeatedly.
This patch reorganizes the bnxt_ctx_mem_info structure to fit better
with the new firmware interface. It will also work with the legacy
firmware interface. The flat fields in bnxt_ctx_mem_info are replaced
by the bnxt_ctx_mem_type array. The bnxt_mem_init array info will no
longer be needed.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:43:54 +0000 (15:43 -0800)]
bnxt_en: Free bp->ctx inside bnxt_free_ctx_mem()
We always free bp->ctx right after calling bnxt_free_ctx_mem(), so just
free it at the end of that function to make things simpler.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 20 Nov 2023 23:43:53 +0000 (15:43 -0800)]
bnxt_en: The caller of bnxt_alloc_ctx_mem() should always free bp->ctx
bnxt_alloc_ctx_mem() calls bnxt_hwrm_func_backing_store_qcaps() to
allocate the memory for bp->ctx. Initialize bp->ctx with the allocated
memory and let the caller free it during unwind. The unwind logic is
already there, we just need to always set bp->ctx to the allocated
memory so the caller will always free it. This simplifies the logic
and makes it easier to expand on the backing store logic.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 22 Nov 2023 01:22:37 +0000 (17:22 -0800)]
Merge branch 'net-page_pool-add-netlink-based-introspection-part1'
Jakub Kicinski says:
====================
net: page_pool: plit the page_pool_params into fast and slow
Small refactoring in prep for adding more page pool params
which won't be needed on the fast path.
v1: https://lore.kernel.org/all/
20231024160220.
3973311-1-kuba@kernel.org/
RFC: https://lore.kernel.org/all/
20230816234303.
3786178-1-kuba@kernel.org/
====================
Link: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 21 Nov 2023 00:00:35 +0000 (16:00 -0800)]
net: page_pool: avoid touching slow on the fastpath
To fully benefit from previous commit add one byte of state
in the first cache line recording if we need to look at
the slow part.
The packing isn't all that impressive right now, we create
a 7B hole. I'm expecting Olek's rework will reshuffle this,
anyway.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://lore.kernel.org/r/20231121000048.789613-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 21 Nov 2023 00:00:34 +0000 (16:00 -0800)]
net: page_pool: split the page_pool_params into fast and slow
struct page_pool is rather performance critical and we use
16B of the first cache line to store 2 pointers used only
by test code. Future patches will add more informational
(non-fast path) attributes.
It's convenient for the user of the API to not have to worry
which fields are fast and which are slow path. Use struct
groups to split the params into the two categories internally.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20231121000048.789613-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 21 Nov 2023 23:49:30 +0000 (15:49 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-11-21
We've added 19 non-merge commits during the last 4 day(s) which contain
a total of 18 files changed, 1043 insertions(+), 416 deletions(-).
The main changes are:
1) Fix BPF verifier to validate callbacks as if they are called an unknown
number of times in order to fix not detecting some unsafe programs,
from Eduard Zingerman.
2) Fix bpf_redirect_peer() handling which missed proper stats accounting
for veth and netkit and also generally fix missing stats for the latter,
from Peilin Ye, Daniel Borkmann et al.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: check if max number of bpf_loop iterations is tracked
bpf: keep track of max number of bpf_loop callback iterations
selftests/bpf: test widening for iterating callbacks
bpf: widening for callback iterators
selftests/bpf: tests for iterating callbacks
bpf: verify callbacks as if they are called unknown number of times
bpf: extract setup_func_entry() utility function
bpf: extract __check_reg_arg() utility function
selftests/bpf: fix bpf_loop_bench for new callback verification scheme
selftests/bpf: track string payload offset as scalar in strobemeta
selftests/bpf: track tcp payload offset as scalar in xdp_synproxy
selftests/bpf: Add netkit to tc_redirect selftest
selftests/bpf: De-veth-ize the tc_redirect test case
bpf, netkit: Add indirect call wrapper for fetching peer dev
bpf: Fix dev's rx stats for bpf_redirect_peer traffic
veth: Use tstats per-CPU traffic counters
netkit: Add tstats per-CPU traffic counters
net: Move {l,t,d}stats allocation to core and convert veth & vrf
net, vrf: Move dstats structure to core
====================
Link: https://lore.kernel.org/r/20231121193113.11796-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 21 Nov 2023 22:53:11 +0000 (14:53 -0800)]
Merge branch 'mlxsw-preparations-for-support-of-cff-flood-mode'
Petr Machata says:
====================
mlxsw: Preparations for support of CFF flood mode
PGT is an in-HW table that maps addresses to sets of ports. Then when some
HW process needs a set of ports as an argument, instead of embedding the
actual set in the dynamic configuration, what gets configured is the
address referencing the set. The HW then works with the appropriate PGT
entry.
Among other allocations, the PGT currently contains two large blocks for
bridge flooding: one for 802.1q and one for 802.1d. Within each of these
blocks are three tables, for unknown-unicast, multicast and broadcast
flooding:
. . . | 802.1q | 802.1d | . . .
| UC | MC | BC | UC | MC | BC |
\______ _____/ \_____ ______/
v v
FID flood vectors
Thus each FID (which corresponds to an 802.1d bridge or one VLAN in an
802.1q bridge) uses three flood vectors spread across a fairly large region
of PGT.
This way of organizing the flood table (called "controlled") is not very
flexible. E.g. to decrease a bridge scale and store more IP MC vectors, one
would need to completely rewrite the bridge PGT blocks, or resort to hacks
such as storing individual MC flood vectors into unused part of the bridge
table.
In order to address these shortcomings, Spectrum-2 and above support what
is called CFF flood mode, for Compressed FID Flooding. In CFF flood mode,
each FID has a little table of its own, with three entries adjacent to each
other, one for unknown-UC, one for MC, one for BC. This allows for a much
more fine-grained approach to PGT management, where bits of it are
allocated on demand.
. . . | FID | FID | FID | FID | FID | . . .
|U|M|B|U|M|B|U|M|B|U|M|B|U|M|B|
\_____________ _____________/
v
FID flood vectors
Besides the FID table organization, the CFF flood mode also impacts Router
Subport (RSP) table. This table contains flood vectors for rFIDs, which are
FIDs that reference front panel ports or LAGs. The RSP table contains two
entries per front panel port and LAG, one for unknown-UC traffic, and one
for everything else. Currently, the FW allocates and manages the table in
its own part of PGT. rFIDs are marked with flood_rsp bit and managed
specially. In CFF mode, rFIDs are managed as all other FIDs. The driver
therefore has to allocate and maintain the flood vectors. Like with bridge
FIDs, this is more work, but increases flexibility of the system.
The FW currently supports both the controlled and CFF flood modes. To shed
complexity, in the future it should only support CFF flood mode. Hence this
patchset, which is the first in series of two to add CFF flood mode support
to mlxsw.
There are FW versions out there that do not support CFF flood mode, and on
Spectrum-1 in particular, there is no plan to support it at all. mlxsw will
therefore have to support both controlled flood mode as well as CFF.
Another aspect is that at least on Spectrum-1, there are FW versions out
there that claim to support CFF flood mode, but then reject or ignore
configurations enabling the same. The driver thus has to have a say in
whether an attempt to configure CFF flood mode should even be made.
Much like with the LAG mode, the feature is therefore expressed in terms of
"does the driver prefer CFF flood mode?", and "what flood mode the PCI
module managed to configure the FW with". This gives to the driver a chance
to determine whether CFF flood mode configuration should be attempted.
In this patchset, we lay the ground with new definitions, registers and
their fields, and some minor code shaping. The next patchset will be more
focused on introducing necessary abstractions and implementation.
- Patches #1 and #2 add CFF-related items to the command interface.
- Patch #3 adds a new resource, for maximum number of flood profiles
supported. (A flood profile is a mapping between traffic type and offset
in the per-FID flood vector table.)
- Patches #4 to #8 adjust reg.h. The SFFP register is added, which is used
for configuring the abovementioned traffic-type-to-offset mapping. The
SFMR, register, which serves for FID configuration, is extended with
fields specific to CFF mode. And other minor adjustments.
- Patches #9 and #10 add the plumbing for CFF mode: a way to request that
CFF flood mode be configured, and a way to query the flood mode that was
actually configured.
- Patch #11 removes dead code.
- Patches #12 and #13 add helpers that the next patchset will make use of.
Patch #14 moves RIF setup ahead so that FID code can make use of it.
====================
Link: https://lore.kernel.org/r/cover.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:31 +0000 (19:25 +0100)]
mlxsw: spectrum_router: Call RIF setup before obtaining FID
For subport RIFs, the setup initializes, among other things, RIF port and
LAG numbers. Those are important to determine where in the PGT the RIF FID
will be stored. Therefore, call the RIF setup before fid_get.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/f24d8cad7e4748b8e8e0e16894ca6a20704dea32.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:30 +0000 (19:25 +0100)]
mlxsw: spectrum_router: Add a helper to get subport number from a RIF
In the CFF flood mode, responsibility for management of the PGT entries for
rFIDs is moved from FW to the driver. All rFIDs are based off either a
front panel port, or a LAG port. The flood vectors for port-based rFIDs
enable just the port itself, the ones for LAG-based rFIDs enable all member
ports of the LAG in question.
Since all rFIDs based off the same port have the same flood vector, and
similarly for LAG-based rFIDs, the flood entries are shared. The PGT
address of the flood vector is therefore determined based on the port (or
LAG) number of the RIF connected with the rFID.
Add a helper to determine subport number given a RIF, to be used in these
calculations.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/d7ab43cf5b021f785f363f236e4b6780d10eea93.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:29 +0000 (19:25 +0100)]
mlxsw: spectrum_fid: Extract SFMR packing into a helper
Both mlxsw_sp_fid_op() and mlxsw_sp_fid_edit_op() pack the core of SFMR the
same way. Extract the common code into a helper and call that. Extract out
of that a wrapper that just calls mlxsw_reg_sfmr_pack(), because it will
be useful for the dummy family later on.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/31f32b4d767183f6cb197148d0792feab2efadba.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:28 +0000 (19:25 +0100)]
mlxsw: spectrum_fid: Drop unnecessary conditions
The caller already only calls mlxsw_sp_fid_flood_tables_init() and
mlxsw_sp_fid_flood_tables_fini() if (fid_family->flood_tables). There
is no configuration where the pointer is non-NULL, but the number of
tables is zero. So drop the conditions.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/897c6841bc756ac632b797bf67ac83c6a66ba359.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:27 +0000 (19:25 +0100)]
mlxsw: pci: Permit enabling CFF mode
There are FW versions out there that do not support CFF flood mode, and on
Spectrum-1 in particular, there is no plan to support it at all. mlxsw will
therefore have to support both controlled flood mode as well as CFF. There
are also FW versions out there that claim to support CFF flood mode, but
then reject or ignore configurations enabling the same. The driver thus has
to have a say in whether an attempt to configure CFF flood mode should even
be made, and what to use as a fallback.
Hence express the feature in terms of "does the driver prefer CFF flood
mode?", and "what flood mode the PCI module managed to configure the FW
with". This gives to the driver a chance to determine whether CFF flood
mode configuration should be attempted.
The latter bit was added in previous patches. In this patch, add the bit
that allows the driver to determine whether CFF enablement should be
attempted, and the enablement code itself.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/41640a0ee58e0a9538f820f7b601a0e35f6449e4.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:26 +0000 (19:25 +0100)]
mlxsw: core, pci: Add plumbing related to CFF mode
CFF mode, for Compressed FID Flooding, is a way of organizing flood vectors
in the PGT table. The bus module determines whether CFF is supported, can
configure flood mode to CFF if it is, and knows what flood mode has been
configured. Therefore add a bus callback to determine the configured flood
mode. Also add to core an API to query it.
Since after this patch, we rely on mlxsw_pci->flood_mode being set, it
becomes a coding error if a driver invokes this function with a set of
fields that misses the initialization. Warn and bail out in that case.
The CFF mode is not used as of this patch. The code to actually use it will
be added later.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/889d58759dd40f5037f2206b9fc4a78a9240da80.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:25 +0000 (19:25 +0100)]
mlxsw: reg: Add to SFMR register the fields related to CFF flood mode
Add the field cff_mid_base, which specifies at which point in PGT the
per-FID flood table is stored. Add cff_prf_id, the profile ID, which
determines on which row of the flood table a flood vector can be found for
a given traffic type.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/3ad7ae38cf6534bedcd876f16090d109a814b3e3.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:24 +0000 (19:25 +0100)]
mlxsw: reg: Extract flood-mode specific part of mlxsw_reg_sfmr_pack()
In CFF mode, it is necessary to set a different set of SFMR fields. Leave
in mlxsw_reg_sfmr_pack() only the common bits, and move the parts relevant
to controlled flood mode directly to the call site.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/6f29639ebc3ca0722272e6c644ca910096469413.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:23 +0000 (19:25 +0100)]
mlxsw: reg: Drop unnecessary writes from mlxsw_reg_sfmr_pack()
The MLXSW_REG_ZERO at the beginning of the function wipes the whole
payload. There's no need to set vtfp and vv to false explicitly.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/04a51ea7cf31eea0ef7707311d8e864e2d9ef307.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:22 +0000 (19:25 +0100)]
mlxsw: reg: Mark SFGC & some SFMR fields as reserved in CFF mode
Some existing fields and the whole register of SFGC are reserved in CFF
mode. Backport the reservation note to these fields.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/e1d5977a8cb778227e4ea2fd1515529957ce5de7.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:21 +0000 (19:25 +0100)]
mlxsw: reg: Add Switch FID Flooding Profiles Register
The SFFP register populates the fid flooding profile tables used for the
NVE flooding and Compressed-FID Flooding (CFF).
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/ca42eb67763bd0c7cf035afc62ef73632f3f61a6.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:20 +0000 (19:25 +0100)]
mlxsw: resources: Add max_cap_nve_flood_prf
max_cap_nve_flood_prf describes maximum number of NVE flooding profiles.
The same value then applies for flooding profiles for flooding in CFF mode.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/064a2e013d879e5f5494167a6c120c4bb85a2204.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:19 +0000 (19:25 +0100)]
mlxsw: cmd: Add MLXSW_CMD_MBOX_CONFIG_PROFILE_FLOOD_MODE_CFF
PGT, a port-group table is an in-HW block of specialized memory that holds
sets of ports. Allocated within the PGT are series of flood tables that
describe to which ports traffic of various types (unknown UC, BC, MC)
should be flooded from which FID. The hitherto-used layout of these flood
tables is being replaced with a more flexible scheme, called compressed FID
flooding (CFF). CFF can be configured through CONFIG_PROFILE.flood_mode.
In this patch, add MLXSW_CMD_MBOX_CONFIG_PROFILE_FLOOD_MODE_CFF, the value
to use to enable the CFF mode.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/fc2e063742856492f8f22b0b87abf431ea6d53d0.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 20 Nov 2023 18:25:18 +0000 (19:25 +0100)]
mlxsw: cmd: Add cmd_mbox.query_fw.cff_support
PGT, a port-group table is an in-HW block of specialized memory that holds
sets of ports. Allocated within the PGT are series of flood tables that
describe to which ports traffic of various types (unknown UC, BC, MC)
should be flooded from which FID. The hitherto-used layout of these flood
tables is being replaced with a more flexible scheme, called compressed FID
flooding (CFF). CFF can be configured through CONFIG_PROFILE.flood_mode.
cff_support determines whether CONFIG_PROFILE.flood_mode can be set to CFF.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/af727d0e1095e30fa45c7e60404637cdc491aeec.1700503643.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 20 Nov 2023 18:41:40 +0000 (10:41 -0800)]
net: do not send a MOVE event when netdev changes netns
Networking supports changing netdevice's netns and name
at the same time. This allows avoiding name conflicts
and having to rename the interface in multiple steps.
E.g. netns1={eth0, eth1}, netns2={eth1} - we want
to move netns1:eth1 to netns2 and call it eth0 there.
If we can't rename "in flight" we'd need to (1) rename
eth1 -> $tmp, (2) change netns, (3) rename $tmp -> eth0.
To rename the underlying struct device we have to call
device_rename(). The rename()'s MOVE event, however, doesn't
"belong" to either the old or the new namespace.
If there are conflicts on both sides it's actually impossible
to issue a real MOVE (old name -> new name) without confusing
user space. And Daniel reports that such confusions do in fact
happen for systemd, in real life.
Since we already issue explicit REMOVE and ADD events
manually - suppress the MOVE event completely. Move
the ADD after the rename, so that the REMOVE uses
the old name, and the ADD the new one.
If there is no rename this changes the picture as follows:
Before:
old ns | KERNEL[213.399289] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[213.401302] add /devices/virtual/net/eth0 (net)
new ns | KERNEL[213.401397] move /devices/virtual/net/eth0 (net)
After:
old ns | KERNEL[266.774257] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[266.774509] add /devices/virtual/net/eth0 (net)
If there is a rename and a conflict (using the exact eth0/eth1
example explained above) we get this:
Before:
old ns | KERNEL[224.316833] remove /devices/virtual/net/eth1 (net)
new ns | KERNEL[224.318551] add /devices/virtual/net/eth1 (net)
new ns | KERNEL[224.319662] move /devices/virtual/net/eth0 (net)
After:
old ns | KERNEL[333.033166] remove /devices/virtual/net/eth1 (net)
new ns | KERNEL[333.035098] add /devices/virtual/net/eth0 (net)
Note that "in flight" rename is only performed when needed.
If there is no conflict for old name in the target netns -
the rename will be performed separately by dev_change_name(),
as if the rename was a different command, and there will still
be a MOVE event for the rename:
Before:
old ns | KERNEL[194.416429] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.418809] add /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.418869] move /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.420866] move /devices/virtual/net/eth1 (net)
After:
old ns | KERNEL[71.917520] remove /devices/virtual/net/eth0 (net)
new ns | KERNEL[71.919155] add /devices/virtual/net/eth0 (net)
new ns | KERNEL[71.920729] move /devices/virtual/net/eth1 (net)
If deleting the MOVE event breaks some user space we should insert
an explicit kobject_uevent(MOVE) after the ADD, like this:
@@ -11192,6 +11192,12 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
netdev_adjacent_add_links(dev);
+ /* User space wants an explicit MOVE event, issue one unless
+ * dev_change_name() will get called later and issue one.
+ */
+ if (!pat || new_name[0])
+ kobject_uevent(&dev->dev.kobj, KOBJ_MOVE);
+
/* Adapt owner in case owning user namespace of target network
* namespace is different from the original one.
*/
Reported-by: Daniel Gröber <dxld@darkboxed.org>
Link: https://lore.kernel.org/all/20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at/
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/all/20231120184140.578375-1-kuba@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 20 Nov 2023 20:01:09 +0000 (12:01 -0800)]
docs: netdev: try to guide people on dealing with silence
There has been more than a few threads which went idle before
the merge window and now people came back to them and started
asking about next steps.
We currently tell people to be patient and not to repost too
often. Our "not too often", however, is still a few orders of
magnitude faster than other subsystems. Or so I feel after
hearing people talk about review rates at LPC.
Clarify in the doc that if the discussion went idle for a week
on netdev, 95% of the time there's no point waiting longer.
Link: https://lore.kernel.org/r/20231120200109.620392-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jose Ignacio Tornos Martinez [Mon, 20 Nov 2023 12:11:41 +0000 (13:11 +0100)]
net: usb: ax88179_178a: avoid two consecutive device resets
The device is always reset two consecutive times (ax88179_reset is called
twice), one from usbnet_probe during the device binding and the other from
usbnet_open.
Remove the non-necessary reset during the device binding and let the reset
operation from open to keep the normal behavior (tested with generic ASIX
Electronics Corp. AX88179 Gigabit Ethernet device).
Reported-by: Herb Wei <weihao.bj@ieisystem.com>
Tested-by: Herb Wei <weihao.bj@ieisystem.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Link: https://lore.kernel.org/r/20231120121239.54504-1-jtornosm@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jose Ignacio Tornos Martinez [Mon, 20 Nov 2023 12:06:29 +0000 (13:06 +0100)]
net: usb: ax88179_178a: fix failed operations during ax88179_reset
Using generic ASIX Electronics Corp. AX88179 Gigabit Ethernet device,
the following test cycle has been implemented:
- power on
- check logs
- shutdown
- after detecting the system shutdown, disconnect power
- after approximately 60 seconds of sleep, power is restored
Running some cycles, sometimes error logs like this appear:
kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to write reg index 0x0001: -19
kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to read reg index 0x0001: -19
...
These failed operation are happening during ax88179_reset execution, so
the initialization could not be correct.
In order to avoid this, we need to increase the delay after reset and
clock initial operations. By using these larger values, many cycles
have been run and no failed operations appear.
It would be better to check some status register to verify when the
operation has finished, but I do not have found any available information
(neither in the public datasheets nor in the manufacturer's driver). The
only available information for the necessary delays is the maufacturer's
driver (original values) but the proposed values are not enough for the
tested devices.
Fixes: e2ca90c276e1f ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver")
Reported-by: Herb Wei <weihao.bj@ieisystem.com>
Tested-by: Herb Wei <weihao.bj@ieisystem.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Link: https://lore.kernel.org/r/20231120120642.54334-1-jtornosm@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 21 Nov 2023 19:56:57 +0000 (11:56 -0800)]
Merge tag 'platform-drivers-x86-v6.7-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Ilpo Järvinen:
"Just a few fixes (one with two non-fix deps) plus tidying up
MAINTAINERS"
* tag 'platform-drivers-x86-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: intel_telemetry: Fix kernel doc descriptions
MAINTAINERS: Drop Mark Gross as maintainer for x86 platform drivers
platform/x86/amd/pmc: adjust getting DRAM size behavior
platform/x86: hp-bioscfg: Remove unused obj in hp_add_other_attributes()
platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes()
platform/x86: hp-bioscfg: move mutex_lock() down in hp_add_other_attributes()
platform/x86: hp-bioscfg: Simplify return check in hp_add_other_attributes()
platform/x86: ideapad-laptop: Set max_brightness before using it
MAINTAINERS: Remove stale entry for SBL platform driver