Marc Kleine-Budde [Fri, 29 Sep 2023 08:18:02 +0000 (10:18 +0200)]
can: dev: can_restart(): move debug message and stats after successful restart
Move the debug message "restarted" and the CAN restart stats_after_
the successful restart of the CAN device, because the restart may
fail.
While there update the error message from printing the error number to
printing symbolic error names.
Link: https://lore.kernel.org/all/20231005-can-dev-fix-can-restart-v2-4-91b5c1fd922c@pengutronix.de
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
[mkl: mention stats in subject and description, too]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Fri, 29 Sep 2023 07:47:38 +0000 (09:47 +0200)]
can: dev: can_restart(): reverse logic to remove need for goto
Reverse the logic in the if statement and eliminate the need for a
goto to simplify code readability.
Link: https://lore.kernel.org/all/20231005-can-dev-fix-can-restart-v2-3-91b5c1fd922c@pengutronix.de
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Fri, 29 Sep 2023 08:25:11 +0000 (10:25 +0200)]
can: dev: can_restart(): fix race condition between controller restart and netif_carrier_on()
This race condition was discovered while updating the at91_can driver
to use can_bus_off(). The following scenario describes how the
converted at91_can driver would behave.
When a CAN device goes into BUS-OFF state, the driver usually
stops/resets the CAN device and calls can_bus_off().
This function sets the netif carrier to off, and (if configured by
user space) schedules a delayed work that calls can_restart() to
restart the CAN device.
The can_restart() function first checks if the carrier is off and
triggers an error message if the carrier is OK.
Then it calls the driver's do_set_mode() function to restart the
device, then it sets the netif carrier to on. There is a race window
between these two calls.
The at91 CAN controller (observed on the sama5d3, a single core 32 bit
ARM CPU) has a hardware limitation. If the device goes into bus-off
while sending a CAN frame, there is no way to abort the sending of
this frame. After the controller is enabled again, another attempt is
made to send it.
If the bus is still faulty, the device immediately goes back to the
bus-off state. The driver calls can_bus_off(), the netif carrier is
switched off and another can_restart is scheduled. This occurs within
the race window before the original can_restart() handler marks the
netif carrier as OK. This would cause the 2nd can_restart() to be
called with an OK netif carrier, resulting in an error message.
The flow of the 1st can_restart() looks like this:
can_restart()
// bail out if netif_carrier is OK
netif_carrier_ok(dev)
priv->do_set_mode(dev, CAN_MODE_START)
// enable CAN controller
// sama5d3 restarts sending old message
// CAN devices goes into BUS_OFF, triggers IRQ
// IRQ handler start
at91_irq()
at91_irq_err_line()
can_bus_off()
netif_carrier_off()
schedule_delayed_work()
// IRQ handler end
netif_carrier_on()
The 2nd can_restart() will be called with an OK netif carrier and the
error message will be printed.
To close the race window, first set the netif carrier to on, then
restart the controller. In case the restart fails with an error code,
roll back the netif carrier to off.
Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface")
Link: https://lore.kernel.org/all/20231005-can-dev-fix-can-restart-v2-2-91b5c1fd922c@pengutronix.de
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Thu, 28 Sep 2023 19:58:23 +0000 (21:58 +0200)]
can: dev: can_restart(): don't crash kernel if carrier is OK
During testing, I triggered a can_restart() with the netif carrier
being OK [1]. The BUG_ON, which checks if the carrier is OK, results
in a fatal kernel crash. This is neither helpful for debugging nor for
a production system.
[1] The root cause is a race condition in can_restart() which will be
fixed in the next patch.
Do not crash the kernel, issue an error message instead, and continue
restarting the CAN device anyway.
Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface")
Link: https://lore.kernel.org/all/20231005-can-dev-fix-can-restart-v2-1-91b5c1fd922c@pengutronix.de
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Justin Stitt [Thu, 5 Oct 2023 00:05:35 +0000 (00:05 +0000)]
can: peak_pci: replace deprecated strncpy with strscpy
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
NUL-padding is not required since card is already zero-initialized:
| card = kzalloc(sizeof(*card), GFP_KERNEL);
A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on the destination buffer without
unnecessarily NUL-padding.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20231005-strncpy-drivers-net-can-sja1000-peak_pci-c-v1-1-c36e1702cd56@google.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jiapeng Chong [Fri, 25 Aug 2023 06:46:56 +0000 (14:46 +0800)]
can: raw: Remove NULL check before dev_{put, hold}
The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there
is no need to check before using dev_{put, hold}, remove it to silence
the warning:
./net/can/raw.c:497:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6231
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reported-by: Simon Horman <horms@kernel.org>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230825064656.87751-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Wed, 4 Oct 2023 09:46:27 +0000 (11:46 +0200)]
Merge patch series "can: etas_es58x: clean-up of new GCC W=1 and old checkpatch warnings"
Vincent Mailhol <mailhol.vincent@wanadoo.fr> says:
The kernel recently added new warnings, one of which triggers a known
false positive on the etas_es58x module. In an effort to keep
es58x_etas free of any W=12 (excluding those produced by foreign
headers), add a workaround to silence it.
While at it, this series also fix a checkpatch warning which I knew
existed for a long time but was too lazy to tackle.
v2 -> v3:
* if the parsing of one of the version/revision numbers fail,
es58x_parse_product_info() immediately returns. If this occurs early,
the other version/revision numbers would still be set to zero (which
is now considered a valid version number). Set the version and
revision to an invalid number before starting the parsing so that
everything is set even if an early return occurs.
v1 -> v2:
* v1 had two different check logics for the version numbers:
- check that none of the sub-version number are zero to make sure
the parsing succeeded
- check that all of the sub-version number fit the expected digit
range to please GCC.
v2 simplifies things by merging those two logics together.
Link: https://lore.kernel.org/all/20230924110914.183898-1-mailhol.vincent@wanadoo.fr
[mkl: fixed typos]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Sun, 24 Sep 2023 11:06:48 +0000 (20:06 +0900)]
can: etas_es58x: add missing a blank line after declaration
Fix below checkpatch warning:
WARNING: Missing a blank line after declarations
#2233: FILE: drivers/net/can/usb/etas_es58x/es58x_core.c:2233:
+ int ret = es58x_init_netdev(es58x_dev, ch_idx);
+ if (ret) {
Fixes: d8f26fd689dd ("can: etas_es58x: remove es58x_get_product_info()")
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20230924110914.183898-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Sun, 24 Sep 2023 11:06:47 +0000 (20:06 +0900)]
can: etas_es58x: rework the version check logic to silence -Wformat-truncation
Following [1], es58x_devlink.c now triggers the following
format-truncation GCC warnings:
drivers/net/can/usb/etas_es58x/es58x_devlink.c: In function ‘es58x_devlink_info_get’:
drivers/net/can/usb/etas_es58x/es58x_devlink.c:201:41: warning: ‘%02u’ directive output may be truncated writing between 2 and 3 bytes into a region of size between 1 and 3 [-Wformat-truncation=]
201 | snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
| ^~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:201:30: note: directive argument in the range [0, 255]
201 | snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
| ^~~~~~~~~~~~~~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:201:3: note: ‘snprintf’ output between 9 and 12 bytes into a destination of size 9
201 | snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
202 | fw_ver->major, fw_ver->minor, fw_ver->revision);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:211:41: warning: ‘%02u’ directive output may be truncated writing between 2 and 3 bytes into a region of size between 1 and 3 [-Wformat-truncation=]
211 | snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
| ^~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:211:30: note: directive argument in the range [0, 255]
211 | snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
| ^~~~~~~~~~~~~~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:211:3: note: ‘snprintf’ output between 9 and 12 bytes into a destination of size 9
211 | snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212 | bl_ver->major, bl_ver->minor, bl_ver->revision);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:221:38: warning: ‘%03u’ directive output may be truncated writing between 3 and 5 bytes into a region of size between 2 and 4 [-Wformat-truncation=]
221 | snprintf(buf, sizeof(buf), "%c%03u/%03u",
| ^~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:221:30: note: directive argument in the range [0, 65535]
221 | snprintf(buf, sizeof(buf), "%c%03u/%03u",
| ^~~~~~~~~~~~~
drivers/net/can/usb/etas_es58x/es58x_devlink.c:221:3: note: ‘snprintf’ output between 9 and 13 bytes into a destination of size 9
221 | snprintf(buf, sizeof(buf), "%c%03u/%03u",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
222 | hw_rev->letter, hw_rev->major, hw_rev->minor);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is not an actual bug because the sscanf() parsing makes sure that
the u8 are only two digits long and the u16 only three digits long.
Thus below declaration:
char buf[max(sizeof("xx.xx.xx"), sizeof("axxx/xxx"))];
allocates just what is needed to represent either of the versions.
This warning was known but ignored because, at the time of writing,
-Wformat-truncation was not present in the kernel, not even at W=3 [2].
One way to silence this warning is to check the range of all sub
version numbers are valid: [0, 99] for u8 and range [0, 999] for u16.
The module already has a logic which considers that when all the sub
version numbers are zero, the version number is not set. Note that not
having access to the device specification, this was an arbitrary
decision. This logic can thus be removed in favor of global check that
would cover both cases:
- the version number is not set (parsing failed)
- the version number is not valid (paranoiac check to please gcc)
Before starting to parse the product info string, set the version
sub-numbers to the maximum unsigned integer thus violating the
definitions of struct es58x_sw_version or struct es58x_hw_revision.
Then, rework the es58x_sw_version_is_set() and
es58x_hw_revision_is_set() functions: remove the check that the
sub-numbers are non zero and replace it by a check that they fit in
the expected number of digits. This done, rename the functions to
reflect the change and rewrite the documentation. While doing so, also
add a description of the return value.
Finally, the previous version only checked that
&es58x_hw_revision.letter was not the null character. Replace this
check by an alphanumeric character check to make sure that we never
return a special character or a non-printable one and update the
documentation of struct es58x_hw_revision accordingly.
All those extra checks are paranoid but have the merit to silence the
newly introduced W=1 format-truncation warning [1].
[1] commit
6d4ab2e97dcf ("extrawarn: enable format and stringop overflow warnings in W=1")
Link: https://git.kernel.org/torvalds/c/6d4ab2e97dcf
[2] https://lore.kernel.org/all/CAMZ6Rq+K+6gbaZ35SOJcR9qQaTJ7KR0jW=XoDKFkobjhj8CHhw@mail.gmail.com/
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Closes: https://lore.kernel.org/linux-can/20230914-carrousel-wrecker-720a08e173e9-mkl@pengutronix.de/
Fixes: 9f06631c3f1f ("can: etas_es58x: export product information through devlink_ops::info_get()")
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20230924110914.183898-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Miquel Raynal [Fri, 22 Sep 2023 15:51:30 +0000 (17:51 +0200)]
can: sja1000: Fix comment
There is likely a copy-paste error here, as the exact same comment
appears below in this function, one time calling set_reset_mode(), the
other set_normal_mode().
Fixes: 429da1cc841b ("can: Driver for the SJA1000 CAN controller")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/all/20230922155130.592187-1-miquel.raynal@bootlin.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Patrick Rohr [Mon, 25 Sep 2023 21:47:11 +0000 (14:47 -0700)]
net: add sysctl to disable rfc4862 5.5.3e lifetime handling
This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid
lifetime derivation mechanism.
RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router
Advertisement PIO shall be ignored if it less than 2 hours and to reset
the lifetime of the corresponding address to 2 hours. An in-progress
6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently
looking to remove this mechanism. While this draft has not been moving
particularly quickly for other reasons, there is widespread consensus on
section 4.2 which updates RFC4862 section 5.5.3e.
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Jen Linkova <furry@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230925214711.959704-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 3 Oct 2023 22:02:19 +0000 (15:02 -0700)]
Merge branch 'documentation-fixes-for-dpll-subsystem'
Bagas Sanjaya says:
====================
Documentation fixes for dpll subsystem
Here is a mini docs fixes for dpll subsystem. The fixes are all code
block-related.
This series is triggered because I was emailed by kernel test robot,
alerting htmldocs warnings (see patch [1/2]).
[1]: https://lore.kernel.org/all/
20230918093240.29824-1-bagasdotme@gmail.com/
====================
Link: https://lore.kernel.org/r/20230928052708.44820-1-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bagas Sanjaya [Thu, 28 Sep 2023 05:27:08 +0000 (12:27 +0700)]
Documentation: dpll: wrap DPLL_CMD_PIN_GET output in a code block
DPLL_CMD_PIN_GET netlink command output for mux-type pins looks ugly
with normal paragraph formatting. Format it as a code block instead.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20230928052708.44820-3-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bagas Sanjaya [Thu, 28 Sep 2023 05:27:07 +0000 (12:27 +0700)]
Documentation: dpll: Fix code blocks
kernel test robot and Stephen Rothwell report htmldocs warnings:
Documentation/driver-api/dpll.rst:427: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 18 supplied.
.. code-block:: c
<snipped>...
Documentation/driver-api/dpll.rst:444: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 21 supplied.
.. code-block:: c
<snipped>...
Documentation/driver-api/dpll.rst:474: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 12 supplied.
.. code-block:: c
<snipped>...
Fix these above by adding missing blank line separator after code-block
directive.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309180456.lOhxy9gS-lkp@intel.com/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20230918131521.155e9e63@canb.auug.org.au/
Fixes: dbb291f19393b6 ("dpll: documentation on DPLL subsystem interface")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20230928052708.44820-2-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 3 Oct 2023 19:17:13 +0000 (12:17 -0700)]
Merge branch 'introduce-define_flex-macro'
Przemek Kitszel says:
====================
introduce DEFINE_FLEX() macro
Add DEFINE_FLEX() macro, that helps on-stack allocation of structures
with trailing flex array member.
Expose __struct_size() macro which reads size of data allocated
by DEFINE_FLEX().
Accompany new macros introduction with actual usage,
in the ice driver - hence targeting for netdev tree.
Obvious benefits include simpler resulting code, less heap usage,
less error checking. Less obvious is the fact that compiler has
more room to optimize, and as a whole, even with more stuff on the stack,
we end up with overall better (smaller) report from bloat-o-meter:
add/remove: 8/6 grow/shrink: 7/18 up/down: 2211/-2270 (-59)
(individual results in each patch).
====================
Link: https://lore.kernel.org/r/20230912115937.1645707-1-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Tue, 12 Sep 2023 11:59:37 +0000 (07:59 -0400)]
ice: make use of DEFINE_FLEX() in ice_switch.c
Use DEFINE_FLEX() macro for 1-elem flex array members of ice_switch.c
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-8-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Tue, 12 Sep 2023 11:59:36 +0000 (07:59 -0400)]
ice: make use of DEFINE_FLEX() for struct ice_aqc_dis_txq_item
Use DEFINE_FLEX() macro for 1-elem flex array use case
of struct ice_aqc_dis_txq_item.
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-7-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Tue, 12 Sep 2023 11:59:35 +0000 (07:59 -0400)]
ice: make use of DEFINE_FLEX() for struct ice_aqc_add_tx_qgrp
Use DEFINE_FLEX() macro for 1-elem flex array use case
of struct ice_aqc_add_tx_qgrp.
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-6-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Tue, 12 Sep 2023 11:59:34 +0000 (07:59 -0400)]
ice: make use of DEFINE_FLEX() in ice_ddp.c
Use DEFINE_FLEX() macro for constant-num-of-elems (4)
flex array members of ice_ddp.c
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-5-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Tue, 12 Sep 2023 11:59:33 +0000 (07:59 -0400)]
ice: drop two params of ice_aq_move_sched_elems()
Remove two arguments of ice_aq_move_sched_elems().
Last of them was always NULL, and @grps_req was always 1.
Assuming @grps_req to be one, allows us to use DEFINE_FLEX() macro,
what removes some need for heap allocations.
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-4-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Tue, 12 Sep 2023 11:59:32 +0000 (07:59 -0400)]
ice: ice_sched_remove_elems: replace 1 elem array param by u32
Replace array+size params of ice_sched_remove_elems:() by just single u32,
as all callers are using it with "1".
This enables moving from heap-based, to stack-based allocation, what is also
more elegant thanks to DEFINE_FLEX() macro.
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-3-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Tue, 12 Sep 2023 11:59:31 +0000 (07:59 -0400)]
overflow: add DEFINE_FLEX() for on-stack allocs
Add DEFINE_FLEX() macro for on-stack allocations of structs with
flexible array member.
Expose __struct_size() macro outside of fortify-string.h, as it could be
used to read size of structs allocated by DEFINE_FLEX().
Move __member_size() alongside it.
-Kees
Using underlying array for on-stack storage lets us to declare
known-at-compile-time structures without kzalloc().
Actual usage for ice driver is in following patches of the series.
Missing __has_builtin() workaround is moved up to serve also assembly
compilation with m68k-linux-gcc, see [1].
Error was (note the .S file extension):
In file included from ../include/linux/linkage.h:5,
from ../arch/m68k/fpsp040/skeleton.S:40:
../include/linux/compiler_types.h:331:5: warning: "__has_builtin" is not defined, evaluates to 0 [-Wundef]
331 | #if __has_builtin(__builtin_dynamic_object_size)
| ^~~~~~~~~~~~~
../include/linux/compiler_types.h:331:18: error: missing binary operator before token "("
331 | #if __has_builtin(__builtin_dynamic_object_size)
| ^
[1] https://lore.kernel.org/netdev/
202308112122.OuF0YZqL-lkp@intel.com/
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-2-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 3 Oct 2023 14:34:53 +0000 (07:34 -0700)]
Merge branch 'bpf-remove-xdp_do_flush_map'
Sebastian Andrzej Siewior says:
====================
bpf: Remove xdp_do_flush_map().
I had #1 split in several patches per vendor and then decided to merge
it. I can repost it with one patch per vendor if this preferred.
====================
Link: https://lore.kernel.org/r/20230908143215.869913-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sebastian Andrzej Siewior [Fri, 8 Sep 2023 14:32:15 +0000 (16:32 +0200)]
bpf: Remove xdp_do_flush_map().
xdp_do_flush_map() can be removed because there is no more user in tree.
Remove xdp_do_flush_map().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230908143215.869913-3-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sebastian Andrzej Siewior [Fri, 8 Sep 2023 14:32:14 +0000 (16:32 +0200)]
net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush().
xdp_do_flush_map() is deprecated and new code should use xdp_do_flush()
instead.
Replace xdp_do_flush_map() with xdp_do_flush().
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Clark Wang <xiaoning.wang@nxp.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: David Arinzon <darinzon@amazon.com>
Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: John Crispin <john@phrozen.org>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: Louis Peens <louis.peens@corigine.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Mark Lee <Mark-MC.Lee@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Noam Dagan <ndagan@amazon.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Saeed Bishara <saeedb@amazon.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Shay Agroskin <shayagr@amazon.com>
Cc: Shenwei Wang <shenwei.wang@nxp.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230908143215.869913-2-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Fri, 8 Sep 2023 07:03:37 +0000 (10:03 +0300)]
net: microchip: sparx5: clean up error checking in vcap_show_admin()
The vcap_decode_rule() never returns NULL. There is no need to check
for that. This code assumes that if it did return NULL we should
end abruptly and return success. It is confusing. Fix the check to
just be if (IS_ERR()) instead of if (IS_ERR_OR_NULL()).
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202309070831.hTvj9ekP-lkp@intel.com/
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/b88eba86-9488-4749-a896-7c7050132e7b@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 3 Oct 2023 11:51:06 +0000 (13:51 +0200)]
Merge branch 'net-dsa-hsr-enable-hsr-hw-offloading-for-ksz9477'
Lukasz Majewski says:
====================
net: dsa: hsr: Enable HSR HW offloading for KSZ9477
This patch series provides support for HSR HW offloading in KSZ9477
switch IC.
To test this feature:
ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 supervision 45 version 1
ip link set dev lan1 up
ip link set dev lan2 up
ip a add 192.168.0.1/24 dev hsr0
ip link set dev hsr0 up
To remove HSR network device:
ip link del hsr0
To test if one can adjust MAC address:
ip link set lan2 address 00:01:02:AA:BB:CC
It is also possible to create another HSR interface, but it will
only support HSR is software - e.g.
ip link add name hsr1 type hsr slave1 lan3 slave2 lan4 supervision 45 version 1
Test HW:
Two KSZ9477-EVB boards with HSR ports set to "Port1" and "Port2".
Performance SW used:
nuttcp -S --nofork
nuttcp -vv -T 60 -r 192.168.0.2
nuttcp -vv -T 60 -t 192.168.0.2
Code: v6.6.0-rc2+ Linux net-next repository
SHA1:
5a1b322cb0b7d0d33a2d13462294dc0f46911172
Tested HSR v0 and v1
Results:
With KSZ9477 offloading support added: RX: 100 Mbps TX: 98 Mbps
With no offloading RX: 63 Mbps TX: 63 Mbps
====================
Link: https://lore.kernel.org/r/20230922133108.2090612-1-lukma@denx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lukasz Majewski [Fri, 22 Sep 2023 13:31:08 +0000 (15:31 +0200)]
net: dsa: microchip: Enable HSR offloading for KSZ9477
This patch adds functions for providing in KSZ9477 switch HSR
(High-availability Seamless Redundancy) hardware offloading.
According to AN3474 application note following features are provided:
- TX packet duplication from host to switch (NETIF_F_HW_HSR_DUP)
- RX packet duplication discarding
- Prevention of packet loop
For last two ones - there is a probability that some packets will not
be filtered in HW (in some special cases - described in AN3474).
Hence, the HSR core code shall be used to discard those not caught frames.
Moreover, some switch registers adjustments are required - like setting
MAC address of HSR network interface.
Additionally, the KSZ9477 switch has been configured to forward frames
between HSR ports (e.g. 1,2) members to provide support for
NETIF_F_HW_HSR_FWD flag.
Join and leave functions are written in a way, that are executed with
single port - i.e. configuration is NOT done only when second HSR port
is configured.
Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Fri, 22 Sep 2023 13:31:07 +0000 (15:31 +0200)]
net: dsa: microchip: move REG_SW_MAC_ADDR to dev->info->regs[]
Defining macros which have the same name but different values is bad
practice, because it makes it hard to avoid code duplication. The same
code does different things, depending on the file it's placed in.
Case in point, we want to access REG_SW_MAC_ADDR from ksz_common.c, but
currently we can't, because we don't know which kszXXXX_reg.h to include
from the common code.
Remove the REG_SW_MAC_ADDR_{0..5} macros from ksz8795_reg.h and
ksz9477_reg.h, and re-add this register offset to the dev->info->regs[]
array.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lukasz Majewski [Fri, 22 Sep 2023 13:31:06 +0000 (15:31 +0200)]
net: dsa: tag_ksz: Extend ksz9477_xmit() for HSR frame duplication
The KSZ9477 has support for HSR (High-Availability Seamless Redundancy).
One of its offloading (i.e. performed in the switch IC hardware) features
is to duplicate received frame to both HSR aware switch ports.
To achieve this goal - the tail TAG needs to be modified. To be more
specific, both ports must be marked as destination (egress) ones.
The NETIF_F_HW_HSR_DUP flag indicates that the device supports HSR and
assures (in HSR core code) that frame is sent only once from HOST to
switch with tail tag indicating both ports.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Fri, 22 Sep 2023 13:31:05 +0000 (15:31 +0200)]
net: dsa: notify drivers of MAC address changes on user ports
In some cases, drivers may need to veto the changing of a MAC address on
a user port. Such is the case with KSZ9477 when it offloads a HSR device,
because it programs the MAC address of multiple ports to a shared
hardware register. Those ports need to have equal MAC addresses for the
lifetime of the HSR offload.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Fri, 22 Sep 2023 13:31:04 +0000 (15:31 +0200)]
net: dsa: propagate extack to ds->ops->port_hsr_join()
Drivers can provide meaningful error messages which state a reason why
they can't perform an offload, and dsa_slave_changeupper() already has
the infrastructure to propagate these over netlink rather than printing
to the kernel log. So pass the extack argument and modify the xrs700x
driver's port_hsr_join() prototype.
Also take the opportunity and use the extack for the 2 -EOPNOTSUPP cases
from xrs700x_hsr_join().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raju Lakkaraju [Mon, 25 Sep 2023 08:00:59 +0000 (13:30 +0530)]
net: sfp: add quirk for FS's 2.5G copper SFP
Add a quirk for a copper SFP that identifies itself as "FS" "SFP-2.5G-T".
This module's PHY is inaccessible, and can only run at 2500base-X with the
host without negotiation. Add a quirk to enable the 2500base-X interface mode
with 2500base-T support and disable auto negotiation.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Link: https://lore.kernel.org/r/20230925080059.266240-1-Raju.Lakkaraju@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Beniamino Galvani [Sun, 24 Sep 2023 15:30:14 +0000 (17:30 +0200)]
ipv6: mark address parameters of udp_tunnel6_xmit_skb() as const
The function doesn't modify the addresses passed as input, mark them
as 'const' to make that clear.
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20230924153014.786962-1-b.galvani@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Walleij [Sun, 24 Sep 2023 08:19:02 +0000 (10:19 +0200)]
net: phy: amd: Support the Altima AMI101L
The Altima AC101L is obviously compatible with the AMD PHY,
as seen by reading the datasheet.
Datasheet: https://docs.broadcom.com/doc/AC101L-DS05-405-RDS.pdf
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230924-ac101l-phy-v1-1-5e6349e28aa4@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Christophe JAILLET [Sun, 24 Sep 2023 08:03:07 +0000 (10:03 +0200)]
udp_tunnel: Use flex array to simplify code
'n_tables' is small, UDP_TUNNEL_NIC_MAX_TABLES = 4 as a maximum. So there
is no real point to allocate the 'entries' pointers array with a dedicate
memory allocation.
Using a flexible array for struct udp_tunnel_nic->entries avoids the
overhead of an additional memory allocation.
This also saves an indirection when the array is accessed.
Finally, __counted_by() can be used for run-time bounds checking if
configured and supported by the compiler.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/4a096ba9cf981a588aa87235bb91e933ee162b3d.1695542544.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Walleij [Sat, 23 Sep 2023 18:38:22 +0000 (20:38 +0200)]
net: ixp4xx_eth: Specify min/max MTU
As we don't specify the MTU in the driver, the framework
will fall back to 1500 bytes and this doesn't work very
well when we try to attach a DSA switch:
eth1: mtu greater than device maximum
ixp4xx_eth
c800a000.ethernet eth1: error -22 setting
MTU to 1504 to include DSA overhead
I checked the developer docs and the hardware can actually
do really big frames, so update the driver accordingly.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20230923-ixp4xx-eth-mtu-v1-1-9e88b908e1b2@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 3 Oct 2023 08:05:24 +0000 (10:05 +0200)]
Merge branch 'tcp_metrics-four-fixes'
Eric Dumazet says:
====================
tcp_metrics: four fixes
Looking at an inconclusive syzbot report, I was surprised
to see that tcp_metrics cache on my host was full of
useless entries, even though I have
/proc/sys/net/ipv4/tcp_no_metrics_save set to 1.
While looking more closely I found a total of four issues.
====================
Link: https://lore.kernel.org/r/20230922220356.3739090-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Fri, 22 Sep 2023 22:03:56 +0000 (22:03 +0000)]
tcp_metrics: optimize tcp_metrics_flush_all()
This is inspired by several syzbot reports where
tcp_metrics_flush_all() was seen in the traces.
We can avoid acquiring tcp_metrics_lock for empty buckets,
and we should add one cond_resched() to break potential long loops.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Fri, 22 Sep 2023 22:03:55 +0000 (22:03 +0000)]
tcp_metrics: do not create an entry from tcp_init_metrics()
tcp_init_metrics() only wants to get metrics if they were
previously stored in the cache. Creating an entry is adding
useless costs, especially when tcp_no_metrics_save is set.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Fri, 22 Sep 2023 22:03:54 +0000 (22:03 +0000)]
tcp_metrics: properly set tp->snd_ssthresh in tcp_init_metrics()
We need to set tp->snd_ssthresh to TCP_INFINITE_SSTHRESH
in the case tcp_get_metrics() fails for some reason.
Fixes: 9ad7c049f0f7 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Fri, 22 Sep 2023 22:03:53 +0000 (22:03 +0000)]
tcp_metrics: add missing barriers on delete
When removing an item from RCU protected list, we must prevent
store-tearing, using rcu_assign_pointer() or WRITE_ONCE().
Fixes: 04f721c671656 ("tcp_metrics: Rewrite tcp_metrics_flush_all")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Mon, 2 Oct 2023 19:34:23 +0000 (12:34 -0700)]
Merge branch 'fix-implicit-sign-conversions-in-handshake-upcall'
Chuck Lever says:
====================
Fix implicit sign conversions in handshake upcall
An internal static analysis tool noticed some implicit sign
conversions for some of the arguments in the handshake upcall
protocol.
====================
Link: https://lore.kernel.org/r/169530154802.8905.2645661840284268222.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chuck Lever [Thu, 21 Sep 2023 13:08:07 +0000 (09:08 -0400)]
handshake: Fix sign of key_serial_t fields
key_serial_t fields are signed integers. Use nla_get/put_s32 for
those to avoid implicit signed conversion in the netlink protocol.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/169530167716.8905.645746457741372879.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chuck Lever [Thu, 21 Sep 2023 13:07:40 +0000 (09:07 -0400)]
handshake: Fix sign of socket file descriptor fields
Socket file descriptors are signed integers. Use nla_get/put_s32 for
those to avoid implicit signed conversion in the netlink protocol.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/169530165057.8905.8650469415145814828.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 2 Oct 2023 18:27:17 +0000 (11:27 -0700)]
Merge branch 'mlxsw-annotate-structs-with-__counted_by'
Kees Cook says:
====================
mlxsw: Annotate structs with __counted_by
This annotates several mlxsw structures with the coming __counted_by attribute
for bounds checking of flexible arrays at run-time. For more details, see
commit
dd06e72e68bc ("Compiler Attributes: Add __counted_by macro").
====================
Link: https://lore.kernel.org/r/20230929180611.work.870-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 29 Sep 2023 18:07:44 +0000 (11:07 -0700)]
mlxsw: spectrum_span: Annotate struct mlxsw_sp_span with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_span.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Petr Machata <petrm@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230929180746.3005922-5-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 29 Sep 2023 18:07:43 +0000 (11:07 -0700)]
mlxsw: spectrum_router: Annotate struct mlxsw_sp_nexthop_group_info with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_nexthop_group_info.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Petr Machata <petrm@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230929180746.3005922-4-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 29 Sep 2023 18:07:42 +0000 (11:07 -0700)]
mlxsw: spectrum: Annotate struct mlxsw_sp_counter_pool with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_counter_pool.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Petr Machata <petrm@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230929180746.3005922-3-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 29 Sep 2023 18:07:41 +0000 (11:07 -0700)]
mlxsw: core: Annotate struct mlxsw_env with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mlxsw_env.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Petr Machata <petrm@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230929180746.3005922-2-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 29 Sep 2023 18:07:40 +0000 (11:07 -0700)]
mlxsw: Annotate struct mlxsw_linecards with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mlxsw_linecards.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Petr Machata <petrm@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230929180746.3005922-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 2 Oct 2023 18:25:03 +0000 (11:25 -0700)]
Merge branch 'batch-1-annotate-structs-with-__counted_by'
Kees Cook says:
====================
Batch 1: Annotate structs with __counted_by
This is the batch 1 of patches touching netdev for preparing for
the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by to structs that would
benefit from the annotation.
Since the element count member must be set before accessing the annotated
flexible array member, some patches also move the member's initialization
earlier. (These are noted in the individual patches.)
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
====================
Link: https://lore.kernel.org/r/20230922172449.work.906-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:55 +0000 (10:28 -0700)]
net: tulip: Annotate struct mediatable with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mediatable.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-13-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:54 +0000 (10:28 -0700)]
net: openvswitch: Annotate struct dp_meter with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct dp_meter.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: dev@openvswitch.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-12-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:53 +0000 (10:28 -0700)]
net: enetc: Annotate struct enetc_psfp_gate with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct enetc_psfp_gate.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-11-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:52 +0000 (10:28 -0700)]
net: openvswitch: Annotate struct dp_meter_instance with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct dp_meter_instance.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: dev@openvswitch.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-10-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:51 +0000 (10:28 -0700)]
net: mana: Annotate struct hwc_dma_buf with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct hwc_dma_buf.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Long Li <longli@microsoft.com>
Cc: Ajay Sharma <sharmaajay@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-9-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:50 +0000 (10:28 -0700)]
net: ipa: Annotate struct ipa_power with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ipa_power.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-8-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:49 +0000 (10:28 -0700)]
net: mana: Annotate struct mana_rxq with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mana_rxq.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Ajay Sharma <sharmaajay@microsoft.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-7-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:48 +0000 (10:28 -0700)]
net: hisilicon: Annotate struct rcb_common_cb with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct rcb_common_cb.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-6-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:47 +0000 (10:28 -0700)]
net: enetc: Annotate struct enetc_int_vector with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct enetc_int_vector.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-5-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:46 +0000 (10:28 -0700)]
net: hns: Annotate struct ppe_common_cb with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ppe_common_cb.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-4-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:45 +0000 (10:28 -0700)]
ipv6: Annotate struct ip6_sf_socklist with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ip6_sf_socklist.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-3-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:44 +0000 (10:28 -0700)]
ipv4/igmp: Annotate struct ip_sf_socklist with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ip_sf_socklist.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-2-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 22 Sep 2023 17:28:43 +0000 (10:28 -0700)]
ipv4: Annotate struct fib_info with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct fib_info.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Mon, 2 Oct 2023 07:07:13 +0000 (08:07 +0100)]
Merge branch 'mlxsw-next'
Petr Machata says:
====================
mlxsw: Provide enhancements and new feature
Vadim Pasternak writes:
Patch #1 - Optimize transaction size for efficient retrieval of module
data.
Patch #3 - Enable thermal zone binding with new cooling device.
Patch #4 - Employ standard macros for dividing buffer into the chunks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Fri, 22 Sep 2023 17:18:38 +0000 (19:18 +0200)]
mlxsw: i2c: Utilize standard macros for dividing buffer into chunks
Use standard macro DIV_ROUND_UP() to determine the number of chunks
required for a given buffer.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Fri, 22 Sep 2023 17:18:37 +0000 (19:18 +0200)]
mlxsw: core: Extend allowed list of external cooling devices for thermal zone binding
Extend the list of allowed external cooling devices for thermal zone
binding to include devices of type "emc2305".
The motivation is to provide support for the system SN2201, which is
equipped with the Spectrum-1 ASIC.
The system's airflow control is managed by the EMC2305 RPM-based PWM
Fan Speed Controller as the cooling device.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Fri, 22 Sep 2023 17:18:36 +0000 (19:18 +0200)]
mlxsw: reg: Limit MTBR register payload to a single data record
The MTBR register is used to read temperatures from multiple sensors in
one transaction, but the driver only reads from a single sensor in each
transaction.
Rrestrict the payload size of the MTBR register to prevent the
transmission of redundant data to the firmware.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Oct 2023 18:39:19 +0000 (19:39 +0100)]
Merge branch 'inet-more-data-race-fixes'
Eric Dumazet says:
====================
inet: more data-race fixes
This series fixes some existing data-races on inet fields:
inet->mc_ttl, inet->pmtudisc, inet->tos, inet->uc_index,
inet->mc_index and inet->mc_addr.
While fixing them, we convert eight socket options
to lockless implementation.
v2: addressed David Ahern feedback on ("inet: implement lockless IP_TOS")
Added David Reviewed-by: tag on other patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:21 +0000 (03:42 +0000)]
inet: implement lockless getsockopt(IP_MULTICAST_IF)
Add missing annotations to inet->mc_index and inet->mc_addr
to fix data-races.
getsockopt(IP_MULTICAST_IF) can be lockless.
setsockopt() side is left for later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:20 +0000 (03:42 +0000)]
inet: lockless IP_PKTOPTIONS implementation
Current implementation is already lockless, because the socket
lock is released before reading socket fields.
Add missing READ_ONCE() annotations.
Note that corresponding WRITE_ONCE() are needed, the order
of the patches do not really matter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:19 +0000 (03:42 +0000)]
inet: implement lockless getsockopt(IP_UNICAST_IF)
Add missing READ_ONCE() annotations when reading inet->uc_index
Implementing getsockopt(IP_UNICAST_IF) locklessly seems possible,
the setsockopt() part might not be possible at the moment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:18 +0000 (03:42 +0000)]
inet: lockless getsockopt(IP_MTU)
sk_dst_get() does not require socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:17 +0000 (03:42 +0000)]
inet: lockless getsockopt(IP_OPTIONS)
inet->inet_opt being RCU protected, we can use RCU instead
of locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:16 +0000 (03:42 +0000)]
inet: implement lockless IP_TOS
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:15 +0000 (03:42 +0000)]
inet: implement lockless IP_MTU_DISCOVER
inet->pmtudisc can be read locklessly.
Implement proper lockless reads and writes to inet->pmtudisc
ip_sock_set_mtu_discover() can now be called from arbitrary
contexts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Sep 2023 03:42:14 +0000 (03:42 +0000)]
inet: implement lockless IP_MULTICAST_TTL
inet->mc_ttl can be read locklessly.
Implement proper lockless reads and writes to inet->mc_ttl
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Oct 2023 18:09:55 +0000 (19:09 +0100)]
Merge branch 'socket-option-lockless'
Eric Dumazet says:
====================
net: more data-races fixes and lockless socket options
This is yet another round of data-races fixes,
and lockless socket options.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:18 +0000 (20:28 +0000)]
net: annotate data-races around sk->sk_dst_pending_confirm
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:17 +0000 (20:28 +0000)]
net: annotate data-races around sk->sk_tx_queue_mapping
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:16 +0000 (20:28 +0000)]
net: lockless implementation of SO_TXREHASH
sk->sk_txrehash readers are already safe against
concurrent change of this field.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:15 +0000 (20:28 +0000)]
net: implement lockless SO_MAX_PACING_RATE
SO_MAX_PACING_RATE setsockopt() does not need to hold
the socket lock, because sk->sk_pacing_rate readers
can run fine if the value is changed by other threads,
after adding READ_ONCE() accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:14 +0000 (20:28 +0000)]
net: lockless implementation of SO_BUSY_POLL, SO_PREFER_BUSY_POLL, SO_BUSY_POLL_BUDGET
Setting sk->sk_ll_usec, sk_prefer_busy_poll and sk_busy_poll_budget
do not require the socket lock, readers are lockless anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:13 +0000 (20:28 +0000)]
net: lockless SO_{TYPE|PROTOCOL|DOMAIN|ERROR } setsockopt()
This options can not be set and return -ENOPROTOOPT,
no need to acqure socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:12 +0000 (20:28 +0000)]
net: lockless SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC
sock->flags are atomic, no need to hold the socket lock
in sk_setsockopt() for SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 20:28:11 +0000 (20:28 +0000)]
net: implement lockless SO_PRIORITY
This is a followup of
8bf43be799d4 ("net: annotate data-races
around sk->sk_priority").
sk->sk_priority can be read and written without holding the socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Maximets [Thu, 21 Sep 2023 19:42:35 +0000 (21:42 +0200)]
openvswitch: reduce stack usage in do_execute_actions
do_execute_actions() function can be called recursively multiple
times while executing actions that require pipeline forking or
recirculations. It may also be re-entered multiple times if the packet
leaves openvswitch module and re-enters it through a different port.
Currently, there is a 256-byte array allocated on stack in this
function that is supposed to hold NSH header. Compilers tend to
pre-allocate that space right at the beginning of the function:
a88: 48 81 ec b0 01 00 00 sub $0x1b0,%rsp
NSH is not a very common protocol, but the space is allocated on every
recursive call or re-entry multiplying the wasted stack space.
Move the stack allocation to push_nsh() function that is only used
if NSH actions are actually present. push_nsh() is also a simple
function without a possibility for re-entry, so the stack is returned
right away.
With this change the preallocated space is reduced by 256 B per call:
b18: 48 81 ec b0 00 00 00 sub $0xb0,%rsp
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eelco Chaudron echaudro@redhat.com
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Oct 2023 15:33:01 +0000 (16:33 +0100)]
Merge branch 'dev-stats-virtio-l2tp_eth'
Eric Dumazet says:
====================
net: use DEV_STATS_xxx() helpers in virtio_net and l2tp_eth
Inspired by another (minor) KCSAN syzbot report.
Both virtio_net and l2tp_eth can use DEV_STATS_xxx() helpers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 08:52:18 +0000 (08:52 +0000)]
net: l2tp_eth: use generic dev->stats fields
Core networking has opt-in atomic variant of dev->stats,
simply use DEV_STATS_INC(), DEV_STATS_ADD() and DEV_STATS_READ().
v2: removed @priv local var in l2tp_eth_dev_recv() (Simon)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 08:52:17 +0000 (08:52 +0000)]
virtio_net: avoid data-races on dev->stats fields
Use DEV_STATS_INC() and DEV_STATS_READ() which provide
atomicity on paths that can be used concurrently.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 08:52:16 +0000 (08:52 +0000)]
net: add DEV_STATS_READ() helper
Companion of DEV_STATS_INC() & DEV_STATS_ADD().
This is going to be used in the series.
Use it in macsec_get_stats64().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Thu, 21 Sep 2023 08:50:55 +0000 (14:20 +0530)]
octeontx2-pf: Tc flower offload support for MPLS
This patch extends flower offload support for MPLS protocol.
Due to hardware limitation, currently driver supports lse
depth up to 4.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Thu, 21 Sep 2023 06:35:01 +0000 (08:35 +0200)]
net: ethernet: xilinx: Drop kernel doc comment about return value
During review of the patch that became
2e0ec0afa902 ("net: ethernet:
xilinx: Convert to platform remove callback returning void") in
net-next, Radhey Shyam Pandey pointed out that the change makes the
documentation about the return value obsolete. The patch was applied
without addressing this feedback, so here comes a fix in a separate
patch.
Fixes: 2e0ec0afa902 ("net: ethernet: xilinx: Convert to platform remove callback returning void")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sieng-Piaw Liew [Thu, 21 Sep 2023 00:56:23 +0000 (08:56 +0800)]
net: atl1c: switch to napi_consume_skb()
Switch to napi_consume_skb() to take advantage of bulk free, and skb
reuse through skb cache in conjunction with napi_build_skb().
When parameter 'budget' = 0, indicating non-NAPI context,
dev_consume_skb_any() is called internally.
Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Oct 2023 12:20:36 +0000 (13:20 +0100)]
Merge branch 'sch_fq-improvements'
Eric Dumazet says:
====================
net_sched: sch_fq: round of improvements
For FQ tenth anniversary, it was time for making it faster.
The FQ part (as in Fair Queue) is rather expensive, because
we have to classify packets and store them in a per-flow structure,
and add this per-flow structure in a hash table. Then the RR lists
also add cache line misses.
Most fq qdisc are almost idle. Trying to share NIC bandwidth has
no benefits, thus the qdisc could behave like a FIFO.
This series brings a 5 % throughput increase in intensive
tcp_rr workload, and 13 % increase for (unpaced) UDP packets.
v2: removed an extra label (build bot).
Fix an accidental increase of stat_internal_packets counter
in fast path.
Added "constify qdisc_priv()" patch to allow fq_fastpath_check()
first parameter to be const.
typo on 'eligible' (Willem)
====================
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 20 Sep 2023 20:17:15 +0000 (20:17 +0000)]
net_sched: sch_fq: always garbage collect
FQ performs garbage collection at enqueue time, and only
if number of flows is above a given threshold, which
is hit after the qdisc has been used a bit.
Since an RB-tree traversal is needed to locate a flow,
it makes sense to perform gc all the time, to keep
rb-trees smaller.
This reduces by 50 % average storage costs in FQ,
and avoids 1 cache line miss at enqueue time when
fast path added in prior patch can not be used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 20 Sep 2023 20:17:14 +0000 (20:17 +0000)]
net_sched: sch_fq: add fast path for mostly idle qdisc
TCQ_F_CAN_BYPASS can be used by few qdiscs.
Idea is that if we queue a packet to an empty qdisc,
following dequeue() would pick it immediately.
FQ can not use the generic TCQ_F_CAN_BYPASS code,
because some additional checks need to be performed.
This patch adds a similar fast path to FQ.
Most of the time, qdisc is not throttled,
and many packets can avoid bringing/touching
at least four cache lines, and consuming 128bytes
of memory to store the state of a flow.
After this patch, netperf can send UDP packets about 13 % faster,
and pktgen goes 30 % faster (when FQ is in the way), on a fast NIC.
TCP traffic is also improved, thanks to a reduction of cache line misses.
I have measured a 5 % increase of throughput on a tcp_rr intensive workload.
tc -s -d qd sh dev eth1
...
qdisc fq 8004: parent 1:2 limit 10000p flow_limit 100p buckets 1024
orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit
refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
Sent
5646784384 bytes
1985161 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
flows 122 (inactive 122 throttled 0)
gc 0 highprio 0 fastpath 659990 throttled 27762 latency 8.57us
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 20 Sep 2023 20:17:13 +0000 (20:17 +0000)]
net_sched: sch_fq: change how @inactive is tracked
Currently, when one fq qdisc has no more packets to send, it can still
have some flows stored in its RR lists (q->new_flows & q->old_flows)
This was a design choice, but what is a bit disturbing is that
the inactive_flows counter does not include the count of empty flows
in RR lists.
As next patch needs to know better if there are active flows,
this change makes inactive_flows exact.
Before the patch, following command on an empty qdisc could have returned:
lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
flows 1322 (inactive 1316 throttled 0)
flows 1330 (inactive 1325 throttled 0)
flows 1193 (inactive 1190 throttled 0)
flows 1208 (inactive 1202 throttled 0)
After the patch, we now have:
lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
flows 1322 (inactive 1322 throttled 0)
flows 1330 (inactive 1330 throttled 0)
flows 1193 (inactive 1193 throttled 0)
flows 1208 (inactive 1208 throttled 0)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 20 Sep 2023 20:17:12 +0000 (20:17 +0000)]
net_sched: sch_fq: struct sched_data reorg
q->flows can be often modified, and q->timer_slack is read mostly.
Exchange the two fields, so that cache line countaining
quantum, initial_quantum, and other critical parameters
stay clean (read-mostly).
Move q->watchdog next to q->stat_throttled
Add comments explaining how the structure is split in
three different parts.
pahole output before the patch:
struct fq_sched_data {
struct fq_flow_head new_flows; /* 0 0x10 */
struct fq_flow_head old_flows; /* 0x10 0x10 */
struct rb_root delayed; /* 0x20 0x8 */
u64 time_next_delayed_flow; /* 0x28 0x8 */
u64 ktime_cache; /* 0x30 0x8 */
unsigned long unthrottle_latency_ns; /* 0x38 0x8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct fq_flow internal __attribute__((__aligned__(64))); /* 0x40 0x80 */
/* XXX last struct has 16 bytes of padding */
/* --- cacheline 3 boundary (192 bytes) --- */
u32 quantum; /* 0xc0 0x4 */
u32 initial_quantum; /* 0xc4 0x4 */
u32 flow_refill_delay; /* 0xc8 0x4 */
u32 flow_plimit; /* 0xcc 0x4 */
unsigned long flow_max_rate; /* 0xd0 0x8 */
u64 ce_threshold; /* 0xd8 0x8 */
u64 horizon; /* 0xe0 0x8 */
u32 orphan_mask; /* 0xe8 0x4 */
u32 low_rate_threshold; /* 0xec 0x4 */
struct rb_root * fq_root; /* 0xf0 0x8 */
u8 rate_enable; /* 0xf8 0x1 */
u8 fq_trees_log; /* 0xf9 0x1 */
u8 horizon_drop; /* 0xfa 0x1 */
/* XXX 1 byte hole, try to pack */
<bad> u32 flows; /* 0xfc 0x4 */
/* --- cacheline 4 boundary (256 bytes) --- */
u32 inactive_flows; /* 0x100 0x4 */
u32 throttled_flows; /* 0x104 0x4 */
u64 stat_gc_flows; /* 0x108 0x8 */
u64 stat_internal_packets; /* 0x110 0x8 */
u64 stat_throttled; /* 0x118 0x8 */
u64 stat_ce_mark; /* 0x120 0x8 */
u64 stat_horizon_drops; /* 0x128 0x8 */
u64 stat_horizon_caps; /* 0x130 0x8 */
u64 stat_flows_plimit; /* 0x138 0x8 */
/* --- cacheline 5 boundary (320 bytes) --- */
u64 stat_pkts_too_long; /* 0x140 0x8 */
u64 stat_allocation_errors; /* 0x148 0x8 */
<bad> u32 timer_slack; /* 0x150 0x4 */
/* XXX 4 bytes hole, try to pack */
struct qdisc_watchdog watchdog; /* 0x158 0x48 */
/* size: 448, cachelines: 7, members: 34 */
/* sum members: 411, holes: 2, sum holes: 5 */
/* padding: 32 */
/* paddings: 1, sum paddings: 16 */
/* forced alignments: 1 */
};
pahole output after the patch:
struct fq_sched_data {
struct fq_flow_head new_flows; /* 0 0x10 */
struct fq_flow_head old_flows; /* 0x10 0x10 */
struct rb_root delayed; /* 0x20 0x8 */
u64 time_next_delayed_flow; /* 0x28 0x8 */
u64 ktime_cache; /* 0x30 0x8 */
unsigned long unthrottle_latency_ns; /* 0x38 0x8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct fq_flow internal __attribute__((__aligned__(64))); /* 0x40 0x80 */
/* XXX last struct has 16 bytes of padding */
/* --- cacheline 3 boundary (192 bytes) --- */
u32 quantum; /* 0xc0 0x4 */
u32 initial_quantum; /* 0xc4 0x4 */
u32 flow_refill_delay; /* 0xc8 0x4 */
u32 flow_plimit; /* 0xcc 0x4 */
unsigned long flow_max_rate; /* 0xd0 0x8 */
u64 ce_threshold; /* 0xd8 0x8 */
u64 horizon; /* 0xe0 0x8 */
u32 orphan_mask; /* 0xe8 0x4 */
u32 low_rate_threshold; /* 0xec 0x4 */
struct rb_root * fq_root; /* 0xf0 0x8 */
u8 rate_enable; /* 0xf8 0x1 */
u8 fq_trees_log; /* 0xf9 0x1 */
u8 horizon_drop; /* 0xfa 0x1 */
/* XXX 1 byte hole, try to pack */
<good> u32 timer_slack; /* 0xfc 0x4 */
/* --- cacheline 4 boundary (256 bytes) --- */
<good> u32 flows; /* 0x100 0x4 */
u32 inactive_flows; /* 0x104 0x4 */
u32 throttled_flows; /* 0x108 0x4 */
/* XXX 4 bytes hole, try to pack */
u64 stat_throttled; /* 0x110 0x8 */
<better> struct qdisc_watchdog watchdog; /* 0x118 0x48 */
/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
u64 stat_gc_flows; /* 0x160 0x8 */
u64 stat_internal_packets; /* 0x168 0x8 */
u64 stat_ce_mark; /* 0x170 0x8 */
u64 stat_horizon_drops; /* 0x178 0x8 */
/* --- cacheline 6 boundary (384 bytes) --- */
u64 stat_horizon_caps; /* 0x180 0x8 */
u64 stat_flows_plimit; /* 0x188 0x8 */
u64 stat_pkts_too_long; /* 0x190 0x8 */
u64 stat_allocation_errors; /* 0x198 0x8 */
/* Force padding: */
u64 :64;
u64 :64;
u64 :64;
u64 :64;
/* size: 448, cachelines: 7, members: 34 */
/* sum members: 411, holes: 2, sum holes: 5 */
/* padding: 32 */
/* paddings: 1, sum paddings: 16 */
/* forced alignments: 1 */
};
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>