linux.git
2 years agoselftests/xsk: add unaligned mode test for multi-buffer
Magnus Karlsson [Wed, 19 Jul 2023 13:24:17 +0000 (15:24 +0200)]
selftests/xsk: add unaligned mode test for multi-buffer

Add a test for multi-buffer AF_XDP when using unaligned mode. The test
sends 4096 9K-buffers.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-21-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/xsk: add basic multi-buffer test
Magnus Karlsson [Wed, 19 Jul 2023 13:24:16 +0000 (15:24 +0200)]
selftests/xsk: add basic multi-buffer test

Add the first basic multi-buffer test that sends a stream of 9K
packets and validates that they are received at the other end. In
order to enable sending and receiving multi-buffer packets, code that
sets the MTU is introduced as well as modifications to the XDP
programs so that they signal that they are multi-buffer enabled.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-20-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/xsk: transmit and receive multi-buffer packets
Magnus Karlsson [Wed, 19 Jul 2023 13:24:15 +0000 (15:24 +0200)]
selftests/xsk: transmit and receive multi-buffer packets

Add the ability to send and receive packets that are larger than the
size of a umem frame, using the AF_XDP /XDP multi-buffer
support. There are three pieces of code that need to be changed to
achieve this: the Rx path, the Tx path, and the validation logic.

Both the Rx path and Tx could only deal with a single fragment per
packet. The Tx path is extended with a new function called
pkt_nb_frags() that can be used to retrieve the number of fragments a
packet will consume. We then create these many fragments in a loop and
fill the N-1 first ones to the max size limit to use the buffer space
efficiently, and the Nth one with whatever data that is left. This
goes on until we have filled in at the most BATCH_SIZE worth of
descriptors and fragments. If we detect that the next packet would
lead to BATCH_SIZE number of fragments sent being exceeded, we do not
send this packet and finish the batch. This packet is instead sent in
the next iteration of BATCH_SIZE fragments.

For Rx, we loop over all fragments we receive as usual, but for every
descriptor that we receive we call a new validation function called
is_frag_valid() to validate the consistency of this fragment. The code
then checks if the packet continues in the next frame. If so, it loops
over the next packet and performs the same validation. once we have
received the last fragment of the packet we also call the function
is_pkt_valid() to validate the packet as a whole. If we get to the end
of the batch and we are not at the end of the current packet, we back
out the partial packet and end the loop. Once we get into the receive
loop next time, we start over from the beginning of that packet. This
so the code becomes simpler at the cost of some performance.

The validation function is_frag_valid() checks that the sequence and
packet numbers are correct at the start and end of each fragment.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-19-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: add multi-buffer documentation
Magnus Karlsson [Wed, 19 Jul 2023 13:24:14 +0000 (15:24 +0200)]
xsk: add multi-buffer documentation

Add AF_XDP multi-buffer support documentation including two
pseudo-code samples.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-18-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoi40e: xsk: add TX multi-buffer support
Tirthendu Sarkar [Wed, 19 Jul 2023 13:24:13 +0000 (15:24 +0200)]
i40e: xsk: add TX multi-buffer support

Set eop bit in TX desc command only for the last descriptor of the
packet and do not set for all preceding descriptors.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-17-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoice: xsk: Tx multi-buffer support
Maciej Fijalkowski [Wed, 19 Jul 2023 13:24:12 +0000 (15:24 +0200)]
ice: xsk: Tx multi-buffer support

Most of this patch is about actually supporting XDP_TX action. Pure Tx
ZC support is only about looking at XDP_PKT_CONTD presence at options
field and based on that generating EOP bit on Tx HW descriptor. This is
that simple due to the implementation on
xsk_tx_peek_release_desc_batch() where we are making sure that last
produced descriptor is an EOP one.

Overwrite xdp_zc_max_segs with a value that defines max scatter-gatter
count on Tx side that HW can handle.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-16-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: support ZC Tx multi-buffer in batch API
Maciej Fijalkowski [Wed, 19 Jul 2023 13:24:11 +0000 (15:24 +0200)]
xsk: support ZC Tx multi-buffer in batch API

Modify xskq_cons_read_desc_batch() in a way that each processed
descriptor will be checked if it is an EOP one or not and act
accordingly to that.

Change the behavior of mentioned function to break the processing when
stumbling upon invalid descriptor instead of skipping it. Furthermore,
let us give only full packets down to ZC driver.
With these two assumptions ZC drivers will not have to take care of an
intermediate state of incomplete frames, which will simplify its
implementations a lot.

Last but not least, stop processing when count of frags would exceed
max supported segments on underlying device.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-15-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoi40e: xsk: add RX multi-buffer support
Tirthendu Sarkar [Wed, 19 Jul 2023 13:24:10 +0000 (15:24 +0200)]
i40e: xsk: add RX multi-buffer support

This patch is inspired from the multi-buffer support in non-zc path for
i40e as well as from the patch to support zc on ice. Each subsequent
frag is added to skb_shared_info of the first frag for possible xdp_prog
use as well to xsk buffer list for accessing the buffers in af_xdp.

For XDP_PASS, new pages are allocated for frags and contents are copied
from memory backed by xsk_buff_pool.

Replace next_to_clean with next_to_process as done in non-zc path and
advance it for every buffer and change the semantics of next_to_clean to
point to the first buffer of a packet. Driver will use next_to_process
in the same way next_to_clean was used previously.

For the non multi-buffer case, next_to_process and next_to_clean will
always be the same since each packet consists of a single buffer.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-14-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoice: xsk: add RX multi-buffer support
Maciej Fijalkowski [Wed, 19 Jul 2023 13:24:09 +0000 (15:24 +0200)]
ice: xsk: add RX multi-buffer support

This support is strongly inspired by work that introduced multi-buffer
support to regular Rx data path in ice. There are some differences,
though. When adding a frag, besides adding it to skb_shared_info, use
also fresh xsk_buff_add_frag() helper. Reason for doing both things is
that we can not rule out the fact that AF_XDP pipeline could use XDP
program that needs to access frame fragments. Without them being in
skb_shared_info it will not be possible. Another difference is that
XDP_PASS has to allocate a new pages for each frags and copy contents
from memory backed by xsk_buff_pool.

chain_len that is used for programming HW Rx descriptors no longer has
to be limited to 1 when xsk_pool is present - remove this restriction.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-13-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: support mbuf on ZC RX
Maciej Fijalkowski [Wed, 19 Jul 2023 13:24:08 +0000 (15:24 +0200)]
xsk: support mbuf on ZC RX

Given that skb_shared_info relies on skb_frag_t, in order to support
xskb chaining, introduce xdp_buff_xsk::xskb_list_node and
xsk_buff_pool::xskb_list.

This is needed so ZC drivers can add frags as xskb nodes which will make
it possible to handle it both when producing AF_XDP Rx descriptors as
well as freeing/recycling all the frags that a single frame carries.

Speaking of latter, update xsk_buff_free() to take care of list nodes.
For the former (adding as frags), introduce xsk_buff_add_frag() for ZC
drivers usage that is going to be used to add a frag to xskb list from
pool.

xsk_buff_get_frag() will be utilized by XDP_TX and, on contrary, will
return xdp_buff.

One of the previous patches added a wrapper for ZC Rx so implement xskb
list walk and production of Rx descriptors there.

On bind() path, bail out if socket wants to use ZC multi-buffer but
underlying netdev does not support it.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-12-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: add new netlink attribute dedicated for ZC max frags
Maciej Fijalkowski [Wed, 19 Jul 2023 13:24:07 +0000 (15:24 +0200)]
xsk: add new netlink attribute dedicated for ZC max frags

Introduce new netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS that will
carry maximum fragments that underlying ZC driver is able to handle on
TX side. It is going to be included in netlink response only when driver
supports ZC. Any value higher than 1 implies multi-buffer ZC support on
underlying device.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-11-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: discard zero length descriptors in Tx path
Tirthendu Sarkar [Wed, 19 Jul 2023 13:24:06 +0000 (15:24 +0200)]
xsk: discard zero length descriptors in Tx path

Descriptors with zero length are not supported by many NICs. To preserve
uniform behavior discard any zero length desc as invvalid desc.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-10-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: add support for AF_XDP multi-buffer on Tx path
Tirthendu Sarkar [Wed, 19 Jul 2023 13:24:05 +0000 (15:24 +0200)]
xsk: add support for AF_XDP multi-buffer on Tx path

For transmitting an AF_XDP packet, allocate skb while processing the
first desc and copy data to it. The 'XDP_PKT_CONTD' flag in 'options'
field of the desc indicates the EOP status of the packet. If the current
desc is not EOP, store the skb, release the current desc and go
on to read the next descs.

Allocate a page for each subsequent desc, copy data to it and add it as
a frag in the skb stored in xsk. On processing EOP, transmit the skb
with frags. Addresses contained in descs have been already queued in
consumer queue and skb destructor updated the completion count.

On transmit failure cancel the releases, clear the descs from the
completion queue and consume the skb for retrying packet transmission.

For any invalid descriptor (invalid length/address/options) in the middle
of a packet, all pending descriptors will be dropped by xsk core along
with the invalid one and the next descriptor is treated as the start of
a new packet.

Maximum supported frames for a packet is MAX_SKB_FRAGS + 1. If it is
exceeded, all descriptors accumulated so far are dropped.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-9-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: allow core/drivers to test EOP bit
Maciej Fijalkowski [Wed, 19 Jul 2023 13:24:04 +0000 (15:24 +0200)]
xsk: allow core/drivers to test EOP bit

Drivers are used to check for EOP bit whereas AF_XDP operates on
inverted logic - user space indicates that current frag is not the last
one and packet continues. For AF_XDP core needs, add xp_mb_desc() that
will simply test XDP_PKT_CONTD from xdp_desc::options, but in order to
preserve drivers default behavior, introduce an interface for ZC drivers
that will negate xp_mb_desc() result and therefore make it easier to
test EOP bit from during production of HW Tx descriptors.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-8-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: introduce wrappers and helpers for supporting multi-buffer in Tx path
Tirthendu Sarkar [Wed, 19 Jul 2023 13:24:03 +0000 (15:24 +0200)]
xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path

In Tx path, xsk core reserves space for each desc to be transmitted in
the completion queue and it's address contained in it is stored in the
skb destructor arg. After successful transmission the skb destructor
submits the addr marking completion.

To handle multiple descriptors per packet, now along with reserving
space for each descriptor, the corresponding address is also stored in
completion queue. The number of pending descriptors are stored in skb
destructor arg and is used by the skb destructor to update completions.

Introduce 'skb' in xdp_sock to store a partially built packet when
__xsk_generic_xmit() must return before it sees the EOP descriptor for
the current packet so that packet building can resume in next call of
__xsk_generic_xmit().

Helper functions are introduced to set and get the pending descriptors
in the skb destructor arg. Also, wrappers are introduced for storing
descriptor addresses, submitting and cancelling (for unsuccessful
transmissions) the number of completions.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-7-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: add support for AF_XDP multi-buffer on Rx path
Tirthendu Sarkar [Wed, 19 Jul 2023 13:24:02 +0000 (15:24 +0200)]
xsk: add support for AF_XDP multi-buffer on Rx path

Add multi-buffer support for AF_XDP by extending the XDP multi-buffer
support to be reflected in user-space when a packet is redirected to
an AF_XDP socket.

In the XDP implementation, the NIC driver builds the xdp_buff from the
first frag of the packet and adds any subsequent frags in the skb_shinfo
area of the xdp_buff. In AF_XDP core, XDP buffers are allocated from
xdp_sock's pool and data is copied from the driver's xdp_buff and frags.

Once an allocated XDP buffer is full and there is still data to be
copied, the 'XDP_PKT_CONTD' flag in'options' field of the corresponding
xdp ring descriptor is set and passed to the application. When application
sees the aforementioned flag set it knows there is pending data for this
packet that will be carried in the following descriptors. If there is no
more data to be copied, the flag in 'options' field is cleared for that
descriptor signalling EOP to the application.

If application reads a batch of descriptors using for example the libxdp
interfaces, it is not guaranteed that the batch will end with a full
packet. It might end in the middle of a packet and the rest of the frames
of that packet will arrive at the beginning of the next batch.

AF_XDP ensures that only a complete packet (along with all its frags) is
sent to application.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-6-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: move xdp_buff's data length check to xsk_rcv_check
Tirthendu Sarkar [Wed, 19 Jul 2023 13:24:01 +0000 (15:24 +0200)]
xsk: move xdp_buff's data length check to xsk_rcv_check

If the data in xdp_buff exceeds the xsk frame length, the packet needs
to be dropped. This check is currently being done in __xsk_rcv(). Move
the described logic to xsk_rcv_check() so that such a xdp_buff will
only be dropped if the application does not support multi-buffer
(absence of XDP_USE_SG bind flag). This is applicable for all cases:
copy mode, zero copy mode as well as skb mode.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-5-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: prepare both copy and zero-copy modes to co-exist
Maciej Fijalkowski [Wed, 19 Jul 2023 13:24:00 +0000 (15:24 +0200)]
xsk: prepare both copy and zero-copy modes to co-exist

Currently, __xsk_rcv_zc() is a function that is responsible for
producing AF_XDP Rx descriptors. It is used by both copy and zero-copy
mode. Both of these modes are going to differ when multi-buffer support
is going to be added. ZC will work on a chain of xdp_buff_xsk structs
whereas copy-mode is going to utilize skb_shared_info contents. This
means that ZC-specific changes would affect the copy mode.

Let's modify __xsk_rcv_zc() to work directly on xdp_buff_xsk so the
callsites have to retrieve this from xdp_buff. Also, introduce
xsk_rcv_zc() which will carry all the needed later changes for
supporting multi-buffer on ZC side that do not apply to copy mode.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-4-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: introduce XSK_USE_SG bind flag for xsk socket
Tirthendu Sarkar [Wed, 19 Jul 2023 13:23:59 +0000 (15:23 +0200)]
xsk: introduce XSK_USE_SG bind flag for xsk socket

As of now xsk core drops any xdp_buff with data size greater than the
xsk frame_size as set by the af_xdp application. With multi-buffer
support introduced in the next patch xsk core can now split those
buffers into multiple descriptors provided the af_xdp application can
handle them. Such capability of the application needs to be independent
of the xdp_prog's frag support capability since there are cases where
even a single xdp_buffer may need to be split into multiple descriptors
owing to a smaller xsk frame size.

For e.g., with NIC rx_buffer size set to 4kB, a 3kB packet will
constitute of a single buffer and so will be sent as such to AF_XDP layer
irrespective of 'xdp.frags' capability of the XDP program. Now if the xsk
frame size is set to 2kB by the AF_XDP application, then the packet will
need to be split into 2 descriptors if AF_XDP application can handle
multi-buffer, else it needs to be dropped.

Applications can now advertise their frag handling capability to xsk core
so that xsk core can decide if it should drop or split xdp_buffs that
exceed xsk frame size. This is done using a new 'XSK_USE_SG' bind flag
for the xdp socket.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoxsk: prepare 'options' in xdp_desc for multi-buffer use
Tirthendu Sarkar [Wed, 19 Jul 2023 13:23:58 +0000 (15:23 +0200)]
xsk: prepare 'options' in xdp_desc for multi-buffer use

Use the 'options' field in xdp_desc as a packet continuity marker. Since
'options' field was unused till now and was expected to be set to 0, the
'eop' descriptor will have it set to 0, while the non-eop descriptors
will have to set it to 1. This ensures legacy applications continue to
work without needing any change for single-buffer packets.

Add helper functions and extend xskq_prod_reserve_desc() to use the
'options' field.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-2-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf, x86: initialize the variable "first_off" in save_args()
Menglong Dong [Wed, 19 Jul 2023 11:03:30 +0000 (19:03 +0800)]
bpf, x86: initialize the variable "first_off" in save_args()

As Dan Carpenter reported, the variable "first_off" which is passed to
clean_stack_garbage() in save_args() can be uninitialized, which can
cause runtime warnings with KMEMsan. Therefore, init it with 0.

Fixes: 473e3150e30a ("bpf, x86: allow function arguments up to 12 for TRACING")
Cc: Hao Peng <flyingpeng@tencent.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/bpf/09784025-a812-493f-9829-5e26c8691e07@moroto.mountain/
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Link: https://lore.kernel.org/r/20230719110330.2007949-1-imagedong@tencent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoMerge branch 'allow-bpf_map_sum_elem_count-for-all-program-types'
Alexei Starovoitov [Wed, 19 Jul 2023 16:48:53 +0000 (09:48 -0700)]
Merge branch 'allow-bpf_map_sum_elem_count-for-all-program-types'

Anton Protopopov says:

====================
allow bpf_map_sum_elem_count for all program types

This series is a follow up to the recent change [1] which added
per-cpu insert/delete statistics for maps. The bpf_map_sum_elem_count
kfunc presented in the original series was only available to tracing
programs, so let's make it available to all.

The first patch makes types listed in the reg2btf_ids[] array to be
considered trusted by kfuncs.

The second patch allows to treat CONST_PTR_TO_MAP as trusted pointers from
kfunc's point of view by adding it to the reg2btf_ids[] array.

The third patch adds missing const to the map argument of the
bpf_map_sum_elem_count kfunc.

The fourth patch registers the bpf_map_sum_elem_count for all programs,
and patches selftests correspondingly.

  [1] https://lore.kernel.org/bpf/20230705160139.19967-1-aspsk@isovalent.com/

v1 -> v2:
  * treat the whole reg2btf_ids array as trusted (Alexei)
====================

Link: https://lore.kernel.org/r/20230719092952.41202-1-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: allow any program to use the bpf_map_sum_elem_count kfunc
Anton Protopopov [Wed, 19 Jul 2023 09:29:52 +0000 (09:29 +0000)]
bpf: allow any program to use the bpf_map_sum_elem_count kfunc

Register the bpf_map_sum_elem_count func for all programs, and update the
map_ptr subtest of the test_progs test to test the new functionality.

The usage is allowed as long as the pointer to the map is trusted (when
using tracing programs) or is a const pointer to map, as in the following
example:

    struct {
            __uint(type, BPF_MAP_TYPE_HASH);
            ...
    } hash SEC(".maps");

    ...

    static inline int some_bpf_prog(void)
    {
            struct bpf_map *map = (struct bpf_map *)&hash;
            __s64 count;

            count = bpf_map_sum_elem_count(map);

            ...
    }

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/r/20230719092952.41202-5-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: make an argument const in the bpf_map_sum_elem_count kfunc
Anton Protopopov [Wed, 19 Jul 2023 09:29:51 +0000 (09:29 +0000)]
bpf: make an argument const in the bpf_map_sum_elem_count kfunc

We use the map pointer only to read the counter values, no locking
involved, so mark the argument as const.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/r/20230719092952.41202-4-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map
Anton Protopopov [Wed, 19 Jul 2023 09:29:50 +0000 (09:29 +0000)]
bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map

Add the BTF id of struct bpf_map to the reg2btf_ids array. This makes the
values of the CONST_PTR_TO_MAP type to be considered as trusted by kfuncs.
This, in turn, allows users to execute trusted kfuncs which accept `struct
bpf_map *` arguments from non-tracing programs.

While exporting the btf_bpf_map_id variable, save some bytes by defining
it as BTF_ID_LIST_GLOBAL_SINGLE (which is u32[1]) and not as BTF_ID_LIST
(which is u32[64]).

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/r/20230719092952.41202-3-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: consider types listed in reg2btf_ids as trusted
Anton Protopopov [Wed, 19 Jul 2023 09:29:49 +0000 (09:29 +0000)]
bpf: consider types listed in reg2btf_ids as trusted

The reg2btf_ids array contains a list of types for which we can (and need)
to find a corresponding static BTF id. All the types in the list can be
considered as trusted for purposes of kfuncs.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/r/20230719092952.41202-2-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: Drop useless btf_vmlinux in bpf_tcp_ca
Geliang Tang [Mon, 17 Jul 2023 08:32:52 +0000 (16:32 +0800)]
bpf: Drop useless btf_vmlinux in bpf_tcp_ca

The code using btf_vmlinux in bpf_tcp_ca is removed by the
commit 9f0265e921de ("bpf: Require only one of cong_avoid() and cong_control() from a TCP CC")
so drop this useless btf_vmlinux declaration.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/4d38da4eadaba476bd92ffcd7a5a03a5e28745c0.1689582557.git.geliang.tang@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agosamples/bpf: README: Update build dependencies required
Anh Tuan Phan [Sun, 16 Jul 2023 16:13:51 +0000 (23:13 +0700)]
samples/bpf: README: Update build dependencies required

Update samples/bpf/README.rst to add pahole to the build dependencies
list. Add the reference to "Documentation/process/changes.rst" for
minimum version required so that the version required will not be
outdated in the future.

Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com>
Link: https://lore.kernel.org/r/aecaf7a2-9100-cd5b-5cf4-91e5dbb2c90d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoMerge branch 'bpf-refcount-followups-2-owner-field'
Alexei Starovoitov [Wed, 19 Jul 2023 00:22:26 +0000 (17:22 -0700)]
Merge branch 'bpf-refcount-followups-2-owner-field'

Dave Marchevsky says:

====================
BPF Refcount followups 2: owner field

This series adds an 'owner' field to bpf_{list,rb}_node structs, to be
used by the runtime to determine whether insertion or removal operations
are valid in shared ownership scenarios. Both the races which the series
fixes and the fix itself are inspired by Kumar's suggestions in [0].

Aside from insertion and removal having more reasons to fail, there are
no user-facing changes as a result of this series.

* Patch 1 reverts disabling of bpf_refcount_acquire so that the fixed
logic can be exercised by CI. It should _not_ be applied.
* Patch 2 adds internal definitions of bpf_{rb,list}_node so that
their fields are easier to access.
* Patch 3 is the meat of the series - it adds 'owner' field and
enforcement of correct owner to insertion and removal helpers.
* Patch 4 adds a test based on Kumar's examples.
* Patch 5 disables the test until bpf_refcount_acquire is re-enabled.
* Patch 6 reverts disabling of test added in this series
logic can be exercised by CI. It should _not_ be applied.

  [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2/

Changelog:

v1 -> v2: lore.kernel.org/bpf/20230711175945.3298231-1-davemarchevsky@fb.com/

Patch 2 ("Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node")
  * Rename bpf_{rb,list}_node_internal -> bpf_{list,rb}_node_kern (Alexei)

Patch 3 ("bpf: Add 'owner' field to bpf_{list,rb}_node")
  * WARN_ON_ONCE in __bpf_list_del when node has wrong owner. This shouldn't
    happen, but worth checking regardless (Alexei, offline convo)
  * Continue previous patch's renaming changes
====================

Link: https://lore.kernel.org/r/20230718083813.3416104-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled
Dave Marchevsky [Tue, 18 Jul 2023 08:38:12 +0000 (01:38 -0700)]
selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled

The test added in previous patch will fail with bpf_refcount_acquire
disabled. Until all races are fixed and bpf_refcount_acquire is
re-enabled on bpf-next, disable the test so CI doesn't complain.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230718083813.3416104-6-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: Add rbtree test exercising race which 'owner' field prevents
Dave Marchevsky [Tue, 18 Jul 2023 08:38:11 +0000 (01:38 -0700)]
selftests/bpf: Add rbtree test exercising race which 'owner' field prevents

This patch adds a runnable version of one of the races described by
Kumar in [0]. Specifically, this interleaving:

(rbtree1 and list head protected by lock1, rbtree2 protected by lock2)

Prog A                          Prog B
======================================
n = bpf_obj_new(...)
m = bpf_refcount_acquire(n)
kptr_xchg(map, m)

                                m = kptr_xchg(map, NULL)
                                lock(lock2)
bpf_rbtree_add(rbtree2, m->r, less)
unlock(lock2)

lock(lock1)
bpf_list_push_back(head, n->l)
/* make n non-owning ref */
bpf_rbtree_remove(rbtree1, n->r)
unlock(lock1)

The above interleaving, the node's struct bpf_rb_node *r can be used to
add it to either rbtree1 or rbtree2, which are protected by different
locks. If the node has been added to rbtree2, we should not be allowed
to remove it while holding rbtree1's lock.

Before changes in the previous patch in this series, the rbtree_remove
in the second part of Prog A would succeed as the verifier has no way of
knowing which tree owns a particular node at verification time. The
addition of 'owner' field results in bpf_rbtree_remove correctly
failing.

The test added in this patch splits "Prog A" above into two separate BPF
programs - A1 and A2 - and uses a second mapval + kptr_xchg to pass n
from A1 to A2 similarly to the pass from A1 to B. If the test is run
without the fix applied, the remove will succeed.

Kumar's example had the two programs running on separate CPUs. This
patch doesn't do this as it's not necessary to exercise the broken
behavior / validate fixed behavior.

  [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230718083813.3416104-5-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: Add 'owner' field to bpf_{list,rb}_node
Dave Marchevsky [Tue, 18 Jul 2023 08:38:10 +0000 (01:38 -0700)]
bpf: Add 'owner' field to bpf_{list,rb}_node

As described by Kumar in [0], in shared ownership scenarios it is
necessary to do runtime tracking of {rb,list} node ownership - and
synchronize updates using this ownership information - in order to
prevent races. This patch adds an 'owner' field to struct bpf_list_node
and bpf_rb_node to implement such runtime tracking.

The owner field is a void * that describes the ownership state of a
node. It can have the following values:

  NULL           - the node is not owned by any data structure
  BPF_PTR_POISON - the node is in the process of being added to a data
                   structure
  ptr_to_root    - the pointee is a data structure 'root'
                   (bpf_rb_root / bpf_list_head) which owns this node

The field is initially NULL (set by bpf_obj_init_field default behavior)
and transitions states in the following sequence:

  Insertion: NULL -> BPF_PTR_POISON -> ptr_to_root
  Removal:   ptr_to_root -> NULL

Before a node has been successfully inserted, it is not protected by any
root's lock, and therefore two programs can attempt to add the same node
to different roots simultaneously. For this reason the intermediate
BPF_PTR_POISON state is necessary. For removal, the node is protected
by some root's lock so this intermediate hop isn't necessary.

Note that bpf_list_pop_{front,back} helpers don't need to check owner
before removing as the node-to-be-removed is not passed in as input and
is instead taken directly from the list. Do the check anyways and
WARN_ON_ONCE in this unexpected scenario.

Selftest changes in this patch are entirely mechanical: some BTF
tests have hardcoded struct sizes for structs that contain
bpf_{list,rb}_node fields, those were adjusted to account for the new
sizes. Selftest additions to validate the owner field are added in a
further patch in the series.

  [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230718083813.3416104-4-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node
Dave Marchevsky [Tue, 18 Jul 2023 08:38:09 +0000 (01:38 -0700)]
bpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node

Structs bpf_rb_node and bpf_list_node are opaquely defined in
uapi/linux/bpf.h, as BPF program writers are not expected to touch their
fields - nor does the verifier allow them to do so.

Currently these structs are simple wrappers around structs rb_node and
list_head and linked_list / rbtree implementation just casts and passes
to library functions for those data structures. Later patches in this
series, though, will add an "owner" field to bpf_{rb,list}_node, such
that they're not just wrapping an underlying node type. Moreover, the
bpf linked_list and rbtree implementations will deal with these owner
pointers directly in a few different places.

To avoid having to do

  void *owner = (void*)bpf_list_node + sizeof(struct list_head)

with opaque UAPI node types, add bpf_{list,rb}_node_kern struct
definitions to internal headers and modify linked_list and rbtree to use
the internal types where appropriate.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230718083813.3416104-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoMerge branch 'phy-at803x-support'
David S. Miller [Mon, 17 Jul 2023 09:15:14 +0000 (10:15 +0100)]
Merge branch 'phy-at803x-support'

Luo Jie says:

====================
net: phy: at803x: support qca8081 1G version chip

This patch series add supporting qca8081 1G version chip, the 1G version
chip can be identified by the register mmd7.0x901d bit0.

In addition, qca8081 does not support 1000BaseX mode and the sgmii fifo
reset is added on the link changed, which assert the fifo on the link
down, deassert the fifo on the link up.

Changes in v1:
* switch to use genphy_c45_pma_read_abilities.
* remove the patch [remove 1000BaseX mode of qca8081].
* move the sgmii fifo reset to link_change_notify.

Changes in v2:
* split the qca8081 1G chip support patch.
* improve the slave seed config, disable it if master preferred.

Changes in v3:
* fix the comments.
* add the help function qca808x_has_fast_retrain_or_slave_seed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: at803x: add qca8081 fifo reset on the link changed
Luo Jie [Sun, 16 Jul 2023 08:49:24 +0000 (16:49 +0800)]
net: phy: at803x: add qca8081 fifo reset on the link changed

The qca8081 sgmii fifo needs to be reset on link down and
released on the link up in case of any abnormal issue
such as the packet blocked on the PHY.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: at803x: remove qca8081 1G fast retrain and slave seed config
Luo Jie [Sun, 16 Jul 2023 08:49:23 +0000 (16:49 +0800)]
net: phy: at803x: remove qca8081 1G fast retrain and slave seed config

The fast retrain and slave seed configs are only applicable when the 2.5G
ability is supported.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: at803x: support qca8081 1G chip type
Luo Jie [Sun, 16 Jul 2023 08:49:22 +0000 (16:49 +0800)]
net: phy: at803x: support qca8081 1G chip type

The qca8081 1G chip version does not support 2.5 capability, which
is distinguished from qca8081 2.5G chip according to the bit0 of
register mmd7.0x901d, the 1G version chip also has the same PHY ID
as the normal qca8081 2.5G chip.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: at803x: enable qca8081 slave seed conditionally
Luo Jie [Sun, 16 Jul 2023 08:49:21 +0000 (16:49 +0800)]
net: phy: at803x: enable qca8081 slave seed conditionally

qca8081 is the single port PHY, the slave prefer mode is used
by default.

if the phy master perfer mode is configured, the slave seed
configuration should not be enabled, since the slave seed
enablement is for making PHY linked as slave mode easily.

disable slave seed if the master mode is preferred.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: at803x: merge qca8081 slave seed function
Luo Jie [Sun, 16 Jul 2023 08:49:20 +0000 (16:49 +0800)]
net: phy: at803x: merge qca8081 slave seed function

merge the seed enablement and seed value configuration into
one function, since the random seed value is needed to be
configured when the seed is enabled.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: at803x: support qca8081 genphy_c45_pma_read_abilities
Luo Jie [Sun, 16 Jul 2023 08:49:19 +0000 (16:49 +0800)]
net: phy: at803x: support qca8081 genphy_c45_pma_read_abilities

qca8081 PHY supports to use genphy_c45_pma_read_abilities for
getting the PHY features supported except for the autoneg ability

but autoneg ability exists in MDIO_STAT1 instead of MMD7.1, add it
manually after calling genphy_c45_pma_read_abilities.

Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'qrtr-fixes'
David S. Miller [Mon, 17 Jul 2023 08:02:30 +0000 (09:02 +0100)]
Merge branch 'qrtr-fixes'

Vignesh Viswanathan says:

====================
net: qrtr: Few fixes in QRTR

Add fixes in QRTR ns to change server and nodes radix tree to xarray to
avoid a use-after-free while iterating through the server or nodes
radix tree.

Also fix the destination port value for IPCR control buffer on older
targets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: qrtr: Handle IPCR control port format of older targets
Vignesh Viswanathan [Fri, 14 Jul 2023 05:58:46 +0000 (11:28 +0530)]
net: qrtr: Handle IPCR control port format of older targets

The destination port value in the IPCR control buffer on older
targets is 0xFFFF. Handle the same by updating the dst_port to
QRTR_PORT_CTRL.

Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: qrtr: ns: Change nodes radix tree to xarray
Vignesh Viswanathan [Fri, 14 Jul 2023 05:58:45 +0000 (11:28 +0530)]
net: qrtr: ns: Change nodes radix tree to xarray

There is a use after free scenario while iterating through the nodes
radix tree despite the ns being a single threaded process. This can
happen when the radix tree APIs are not synchronized with the
rcu_read_lock() APIs.

Convert the radix tree for nodes to xarray to take advantage of the
built in rcu lock usage provided by xarray.

Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: qrtr: ns: Change servers radix tree to xarray
Vignesh Viswanathan [Fri, 14 Jul 2023 05:58:44 +0000 (11:28 +0530)]
net: qrtr: ns: Change servers radix tree to xarray

There is a use after free scenario while iterating through the servers
radix tree despite the ns being a single threaded process. This can
happen when the radix tree APIs are not synchronized with the
rcu_read_lock() APIs.

Convert the radix tree for servers to xarray to take advantage of the
built in rcu lock usage provided by xarray.

Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'brcm-asp-2.0-support'
David S. Miller [Mon, 17 Jul 2023 06:39:04 +0000 (07:39 +0100)]
Merge branch 'brcm-asp-2.0-support'

Justin Chen says:

====================
Brcm ASP 2.0 Ethernet Controller

Add support for the Broadcom ASP 2.0 Ethernet controller which is first
introduced with 72165.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMAINTAINERS: ASP 2.0 Ethernet driver maintainers
Justin Chen [Thu, 13 Jul 2023 22:19:06 +0000 (15:19 -0700)]
MAINTAINERS: ASP 2.0 Ethernet driver maintainers

Add maintainers entry for ASP 2.0 Ethernet driver.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: bcm7xxx: Add EPHY entry for 74165
Florian Fainelli [Thu, 13 Jul 2023 22:19:05 +0000 (15:19 -0700)]
net: phy: bcm7xxx: Add EPHY entry for 74165

74165 is a 16nm process SoC with a 10/100 integrated Ethernet PHY,
utilize the recently defined 16nm EPHY macro to configure that PHY.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: mdio-bcm-unimac: Add asp v2.0 support
Justin Chen [Thu, 13 Jul 2023 22:19:04 +0000 (15:19 -0700)]
net: phy: mdio-bcm-unimac: Add asp v2.0 support

Add mdio compat string for ASP 2.0 ethernet driver.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bcmasp: Add support for ethtool driver stats
Justin Chen [Thu, 13 Jul 2023 22:19:03 +0000 (15:19 -0700)]
net: bcmasp: Add support for ethtool driver stats

Add support for ethernet driver specific stats.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bcmasp: Add support for ethtool standard stats
Justin Chen [Thu, 13 Jul 2023 22:19:02 +0000 (15:19 -0700)]
net: bcmasp: Add support for ethtool standard stats

Add support for eth_mac_stats, rmon_stats, and eth_ctrl_stats.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bcmasp: Add support for eee mode
Justin Chen [Thu, 13 Jul 2023 22:19:01 +0000 (15:19 -0700)]
net: bcmasp: Add support for eee mode

Add support for eee mode.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bcmasp: Add support for wake on net filters
Justin Chen [Thu, 13 Jul 2023 22:19:00 +0000 (15:19 -0700)]
net: bcmasp: Add support for wake on net filters

Add support for wake on network filters. The max match is 256 bytes.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bcmasp: Add support for WoL magic packet
Justin Chen [Thu, 13 Jul 2023 22:18:59 +0000 (15:18 -0700)]
net: bcmasp: Add support for WoL magic packet

Add support for Wake-On-Lan magic packet and magic packet with password.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bcmasp: Add support for ASP2.0 Ethernet controller
Justin Chen [Thu, 13 Jul 2023 22:18:58 +0000 (15:18 -0700)]
net: bcmasp: Add support for ASP2.0 Ethernet controller

Add support for the Broadcom ASP 2.0 Ethernet controller which is first
introduced with 72165. This controller features two distinct Ethernet
ports that can be independently operated.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: Brcm ASP 2.0 Ethernet controller
Florian Fainelli [Thu, 13 Jul 2023 22:18:57 +0000 (15:18 -0700)]
dt-bindings: net: Brcm ASP 2.0 Ethernet controller

Add a binding document for the Broadcom ASP 2.0 Ethernet
controller.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: brcm,unimac-mdio: Add asp-v2.0
Justin Chen [Thu, 13 Jul 2023 22:18:56 +0000 (15:18 -0700)]
dt-bindings: net: brcm,unimac-mdio: Add asp-v2.0

The ASP 2.0 Ethernet controller uses a brcm unimac.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: fec: Refactor: rename `adapter` to `fep`
Csókás Bence [Thu, 13 Jul 2023 11:09:33 +0000 (11:09 +0000)]
net: fec: Refactor: rename `adapter` to `fep`

Rename local `struct fec_enet_private *adapter` to `fep` in `fec_ptp_gettime()` to match the rest of the driver

Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogve: trivial spell fix Recive to Receive
Jesper Dangaard Brouer [Thu, 13 Jul 2023 15:54:37 +0000 (17:54 +0200)]
gve: trivial spell fix Recive to Receive

Spotted this trivial spell mistake while casually reading
the google GVE driver code.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mlxsw-rif-pvid'
David S. Miller [Fri, 14 Jul 2023 09:20:15 +0000 (10:20 +0100)]
Merge branch 'mlxsw-rif-pvid'

Petr Machata says:

====================
mlxsw: Manage RIF across PVID changes

The mlxsw driver currently makes the assumption that the user applies
configuration in a bottom-up manner. Thus netdevices need to be added to
the bridge before IP addresses are configured on that bridge or SVI added
on top of it. Enslaving a netdevice to another netdevice that already has
uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
it is rather easy to get into situations where the offloaded configuration
is just plain wrong.

As an example, take a front panel port, configure an IP address: it gets a
RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
port from the bridge again, but the RIF never comes back. There is a number
of similar situations, where changing the configuration there and back
utterly breaks the offload.

The situation is going to be made better by implementing a range of replays
and post-hoc offloads.

In this patch set, address the ordering issues related to creation of
bridge RIFs. Currently, mlxsw has several shortcomings with regards to RIF
handling due to PVID changes:

- In order to cause RIF for a bridge device to be created, the user is
  expected first to set PVID, then to add an IP address. The reverse
  ordering is disallowed, which is not very user-friendly.

- When such bridge gets a VLAN upper whose VID was the same as the existing
  PVID, and this VLAN netdevice gets an IP address, a RIF is created for
  this netdevice. The new RIF is then assigned to the 802.1Q FID for the
  given VID. This results in a working configuration. However, then, when
  the VLAN netdevice is removed again, the RIF for the bridge itself is
  never reassociated to the PVID.

- PVID cannot be changed once the bridge has uppers. Presumably this is
  because the driver does not manage RIFs properly in face of PVID changes.
  However, as the previous point shows, it is still possible to get into
  invalid configurations.

This patch set addresses these issues and relaxes some of the ordering
requirements that mlxsw had. The patch set proceeds as follows:

- In patch #1, pass extack to mlxsw_sp_br_ban_rif_pvid_change()

- To relax ordering between setting PVID and adding an IP address to a
  bridge, mlxsw must be able to request that a RIF is created with a given
  VLAN ID, instead of trying to deduce it from the current netdevice
  settings, which do not reflect the user-requested values yet. This is
  done in patches #2 and #3.

- Similarly, mlxsw_sp_inetaddr_bridge_event() will need to make decisions
  based on the user-requested value of PVID, not the current value. Thus in
  patches #4 and #5, add a new argument which carries the requested PVID
  value.

- Finally in patch #6 relax the ban on PVID changes when a bridge has
  uppers. Instead, add the logic necessary for creation of a RIF as a
  result of PVID change.

- Relevant selftests are presented afterwards. In patch #7 a preparatory
  helper is added to lib.sh. Patches #8, #9, #10 and #11 include selftests
  themselves.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: router_bridge_pvid_vlan_upper: Add a new selftest
Petr Machata [Thu, 13 Jul 2023 16:15:34 +0000 (18:15 +0200)]
selftests: router_bridge_pvid_vlan_upper: Add a new selftest

This tests whether addition and deletion of a VLAN upper that coincides
with the current PVID setting throws off forwarding.

This selftests is specifically geared towards offloading drivers. In
particular, mlxsw used to fail this selftest, and an earlier patch in this
patchset fixes the issue. However, there's nothing HW-specific in the test
itself (it absolutely is supposed to pass on SW datapath), and therefore it
is put into the generic forwarding directory.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: router_bridge_vlan_upper_pvid: Add a new selftest
Petr Machata [Thu, 13 Jul 2023 16:15:33 +0000 (18:15 +0200)]
selftests: router_bridge_vlan_upper_pvid: Add a new selftest

This tests whether changes to PVID that coincide with an existing VLAN
upper throw off forwarding. This selftests is specifically geared towards
offloading drivers, but since there's nothing HW-specific in the test
itself (it absolutely is supposed to pass on SW datapath), it is put into
the generic forwarding directory.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: router_bridge_vlan: Add PVID change test
Petr Machata [Thu, 13 Jul 2023 16:15:32 +0000 (18:15 +0200)]
selftests: router_bridge_vlan: Add PVID change test

Add an alternative path involving VLAN 777 instead of the current 555. Then
add tests that verify that marking 777 as PVID makes the 555 path not work,
and the 777 path work.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: router_bridge: Add tests to remove and add PVID
Petr Machata [Thu, 13 Jul 2023 16:15:31 +0000 (18:15 +0200)]
selftests: router_bridge: Add tests to remove and add PVID

This test relies on PVID being configured on the bridge itself. Thus when
it is deconfigured, the system should lose the ability to forward traffic.
Later when it is added again, the ability to forward traffic should be
regained. Add tests to exercise these configuration changes and verify
results.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: forwarding: lib: Add ping6_, ping_test_fails()
Petr Machata [Thu, 13 Jul 2023 16:15:30 +0000 (18:15 +0200)]
selftests: forwarding: lib: Add ping6_, ping_test_fails()

Add two helpers to run a ping test that succeeds when the pings themselves
fail.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_switchdev: Manage RIFs on PVID change
Petr Machata [Thu, 13 Jul 2023 16:15:29 +0000 (18:15 +0200)]
mlxsw: spectrum_switchdev: Manage RIFs on PVID change

Currently, mlxsw has several shortcomings with regards to RIF handling due
to PVID changes:

- In order to cause RIF for a bridge device to be created, the user is
  expected first to set PVID, then to add an IP address. The reverse
  ordering is disallowed, which is not very user-friendly.

- When such bridge gets a VLAN upper whose VID was the same as the existing
  PVID, and this VLAN netdevice gets an IP address, a RIF is created for
  this netdevice. The new RIF is then assigned to the 802.1Q FID for the
  given VID. This results in a working configuration. However, then, when
  the VLAN netdevice is removed again, the RIF for the bridge itself is
  never reassociated to the VLAN.

- PVID cannot be changed once the bridge has uppers. Presumably this is
  because the driver does not manage RIFs properly in face of PVID changes.
  However, as the previous point shows, it is still possible to get into
  invalid configurations.

In this patch, add the logic necessary for creation of a RIF as a result of
PVID change. Moreover, when a VLAN upper is created whose VID matches lower
PVID, do not create RIF for this netdevice.

These changes obviate the need for ordering of IP address additions and
PVID configuration, so stop forbidding addition of an IP address to a
PVID-less bridge. Instead, bail out quietly. Also stop preventing PVID
changes when the bridge has uppers.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_router: mlxsw_sp_inetaddr_bridge_event: Add an argument
Petr Machata [Thu, 13 Jul 2023 16:15:28 +0000 (18:15 +0200)]
mlxsw: spectrum_router: mlxsw_sp_inetaddr_bridge_event: Add an argument

For purposes of replay, mlxsw_sp_inetaddr_bridge_event() will need to make
decisions based on the proposed value of PVID. Querying PVID reveals the
current settings, not the in-flight values that the user requested and that
the notifiers are acting upon. Add a parameter, lower_pvid, which carries
the proposed PVID of the lower bridge, or -1 if the lower is not a bridge.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_router: Adjust mlxsw_sp_inetaddr_vlan_event() coding style
Petr Machata [Thu, 13 Jul 2023 16:15:27 +0000 (18:15 +0200)]
mlxsw: spectrum_router: Adjust mlxsw_sp_inetaddr_vlan_event() coding style

The bridge branch of the dispatch in this function is going to get more
code and will need curly braces. Per the doctrine, that means the whole
if-else chain should get them.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_router: Take VID for VLAN FIDs from RIF params
Petr Machata [Thu, 13 Jul 2023 16:15:26 +0000 (18:15 +0200)]
mlxsw: spectrum_router: Take VID for VLAN FIDs from RIF params

Currently, when an IP address is added to a bridge that has no PVID, the
operation is rejected. An IP address addition is interpreted as a request
to create a RIF for the bridge device, but without a PVID there is no VLAN
for which the RIF should be created. Thus the correct way to create a RIF
for a bridge as a user is to first add a PVID, and then add the IP address.

Ideally this ordering requirement would not exist. RIF would be created
either because an IP address is added, or because a PVID is added,
depending on which comes last.

For that, the switchdev code (which notices the PVID change request) must
be able to request that a RIF is created with a given VLAN ID, because at
the time that the PVID notification is distributed, the PVID setting is not
yet visible for querying.

Therefore when creating a VLAN-based RIF, use mlxsw_sp_rif_params.vid to
communicate the VID, and do not determine it ad-hoc in the fid_get
callback.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_router: Pass struct mlxsw_sp_rif_params to fid_get
Petr Machata [Thu, 13 Jul 2023 16:15:25 +0000 (18:15 +0200)]
mlxsw: spectrum_router: Pass struct mlxsw_sp_rif_params to fid_get

The fid_get callback is called to allocate a FID for the newly-created RIF.
In a following patch, the fid_get implementation for VLANs will be modified
to take the VLAN ID from the parameters instead of deducing it from the
netdevice. To that end, propagate the RIF parameters to the fid_get
callback.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_switchdev: Pass extack to mlxsw_sp_br_ban_rif_pvid_change()
Petr Machata [Thu, 13 Jul 2023 16:15:24 +0000 (18:15 +0200)]
mlxsw: spectrum_switchdev: Pass extack to mlxsw_sp_br_ban_rif_pvid_change()

Currently the reason for rejection of PVID manipulation is dumped to
syslog, and a generic -EBUSY is returned to the userspace. But
switchdev_handle_port_obj_add(), through which we get to
mlxsw_sp_port_vlans_add(), handles extack just fine, and we can pass the
message this way.

This improves visibility into reasons why the request to change PVID
was rejected. Before the change:

 # bridge vlan add dev br vid 2 self pvid untagged
 RTNETLINK answers: Device or resource busy
 (plus a syslog line)

After the change:

 # bridge vlan add dev br vid 2 self pvid untagged
 Error: mlxsw_spectrum: Can't change PVID, it's used by router interface.

Note that this particular error message is going away in the following
patches. However the ability to pass error messages through extack will be
useful more broadly for communicating in particular reasons why a RIF
failed to be created.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'macsec-selftests'
David S. Miller [Fri, 14 Jul 2023 08:16:53 +0000 (09:16 +0100)]
Merge branch 'macsec-selftests'

Sabrina Dubroca says:

====================
net: add MACsec offload selftests

Patch 1 adds MACsec offload to netdevsim (unchanged from v2).

Patch 2 adds a corresponding selftest to the rtnetlink testsuite.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: rtnetlink: add MACsec offload tests
Sabrina Dubroca [Thu, 13 Jul 2023 13:20:24 +0000 (15:20 +0200)]
selftests: rtnetlink: add MACsec offload tests

Like the IPsec offload test, this requires netdevsim.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonetdevsim: add dummy macsec offload
Sabrina Dubroca [Thu, 13 Jul 2023 13:20:23 +0000 (15:20 +0200)]
netdevsim: add dummy macsec offload

When the kernel is compiled with MACsec support, add the
NETIF_F_HW_MACSEC feature to netdevsim devices and implement
macsec_ops.

To allow easy testing of failure from the device, support is limited
to 3 SecY's per netdevsim device, and 1 RXSC per SecY.

v2:
 - nsim_macsec_add_secy, return -ENOSPC if secy_count isn't full but
   we can't find an empty slot (Simon Horman)
 - add sci_to_cpu to make sparse happy (Simon Horman)
 - remove set but not used secy variable (kernel test robot and
   Simon Horman)

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodevlink: remove reload failed checks in params get/set callbacks
Jiri Pirko [Thu, 13 Jul 2023 09:44:19 +0000 (11:44 +0200)]
devlink: remove reload failed checks in params get/set callbacks

The checks in question were introduced by:
commit 6b4db2e528f6 ("devlink: Fix use-after-free after a failed reload").
That fixed an issue of reload with mlxsw driver.

Back then, that was a valid fix, because there was a limitation
in place that prevented drivers from registering/unregistering params
when devlink instance was registered.

It was possible to do the fix differently by changing drivers to
register/unregister params in appropriate places making sure the ops
operate only on memory which is allocated and initialized. But that,
as a dependency, would require to remove the limitation mentioned above.

Eventually, this limitation was lifted by:
commit 1d18bb1a4ddd ("devlink: allow registering parameters after the instance")

Also, the alternative fix (which also fixed another issue) was done by:
commit 74cbc3c03c82 ("mlxsw: spectrum_acl_tcam: Move devlink param to TCAM code").

Therefore, the checks are no longer relevant. Each driver should make
sure to have the params registered only when the memory the ops
are working with is allocated and initialized.

So remove the checks.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mv88e6xxx-phylink_pcs'
David S. Miller [Fri, 14 Jul 2023 07:51:49 +0000 (08:51 +0100)]
Merge branch 'mv88e6xxx-phylink_pcs'

Russell King says:

====================
Convert mv88e6xxx to phylink_pcs

This series (previously posted with further patches on the 26 June as
RFC) converts mv88e6xxx to phylink_pcs, and thus moves it from being
a pre-March 2020 legacy driver.

The first four patches lay the ground-work for the conversion by
adding four new methods to the phylink_pcs operations structure:

  pcs_enable() - called when the PCS is going to start to be used
  pcs_disable() - called when the PCS is no longer being used

  pcs_pre_config() - called before the MAC configuration method
  pcs_post_config() - called after the MAC configuration method
      Both of these are necessary for some of the mv88e639x
      workarounds.

We also add the ability to inform phylink of a change to the PCS
state without involving the MAC later, by providing
phylink_pcs_change() which takes a phylink_pcs structure rather than
a phylink structure. phylink maintains which instance the PCS is
conencted to, so internally it can do the right thing when the PCS
is in-use.

Then we provide some additional mdiobus and mdiodev accessors that
we will be using in the new PCS drivers.

The changes for mv88e6xxx follow, and the first one needs to be
explicitly pointed out - we (Andrew and myself) have both decided that
all possible approaches to maintaining backwards compatibility with DT
have been exhaused - everyone has some objection to everything that
has been proposed. So, after many years of trying, we have decided
that this is just an impossibility, and with this patch, we are now
intentionally and knowingly breaking any DT that does not specify the
CPU and DSA port fixed-link parameters. Hence why Andrew has recently
been submitting DT update patches. It is regrettable that it has come
to this.

Following this, we start preparing 88e6xxx for phylink_pcs conversion
by padding the mac_select_pcs() DSA method, and the internal hooks to
create and tear-down PCS instances. Rather than bloat the already very
large mv88e6xxx_ops structure, I decided that it would be better that
the new internal chip specific PCS methods are all grouped within their
own structure - and this structure can be declared in the PCS drivers
themselves.

Then we have the actual conversion patches, one for each family of PCS.

Lastly, we clean up the driver after conversion, removing all the now
redundant code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: cleanup after phylink_pcs conversion
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:59 +0000 (09:42 +0100)]
net: dsa: mv88e6xxx: cleanup after phylink_pcs conversion

Now that mv88e6xxx is completely converted to using phylink_pcs
support, we have no need for the serdes methods. Remove all this
infrastructure. Also remove the __maybe_unused from
mv88e6xxx_pcs_select().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: convert 88e639x to phylink_pcs
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:53 +0000 (09:42 +0100)]
net: dsa: mv88e6xxx: convert 88e639x to phylink_pcs

Convert the 88E6390, 88E6390X, and 88E6393X family of switches to use
the phylink_pcs infrastructure.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: convert 88e6352 to phylink_pcs
Russell King [Thu, 13 Jul 2023 08:42:48 +0000 (09:42 +0100)]
net: dsa: mv88e6xxx: convert 88e6352 to phylink_pcs

Convert the 88E6352 SERDES code to use the phylink_pcs infrastructure.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: convert 88e6185 to phylink_pcs
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:43 +0000 (09:42 +0100)]
net: dsa: mv88e6xxx: convert 88e6185 to phylink_pcs

Convert the 88E6185 SERDES code to use the phylink_pcs infrastructure.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: export mv88e6xxx_pcs_decode_state()
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:38 +0000 (09:42 +0100)]
net: dsa: mv88e6xxx: export mv88e6xxx_pcs_decode_state()

Rename and export the PCS state decoding function so our PCS can
make use of the functionality provided by this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: add infrastructure for phylink_pcs
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:33 +0000 (09:42 +0100)]
net: dsa: mv88e6xxx: add infrastructure for phylink_pcs

Add infrastructure for phylink_pcs to the mv88e6xxx driver. This
involves adding a mac_select_pcs() hook so we can pass the PCS to
phylink at the appropriate time, and a PCS initialisation function.

As the various chip implementations are converted to use phylink_pcs,
they are no longer reliant on the legacy phylink behaviour. We detect
this by the use of this infrastructure, or the lack of any serdes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: mv88e6xxx: remove handling for DSA and CPU ports
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:28 +0000 (09:42 +0100)]
net: dsa: mv88e6xxx: remove handling for DSA and CPU ports

As we now always use a fixed-link for DSA and CPU ports, we no longer
need the hack in the Marvell code to make this work. Remove it.

This is especially important with the conversion of DSA drivers to
phylink_pcs, as the PCS code only gets called if we are using
phylink for the port.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mdio: add unlocked mdiobus and mdiodev bus accessors
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:22 +0000 (09:42 +0100)]
net: mdio: add unlocked mdiobus and mdiodev bus accessors

Add the following unlocked accessors to complete the set:
__mdiobus_modify()
__mdiodev_read()
__mdiodev_write()
__mdiodev_modify()
__mdiodev_modify_changed()
which we will need for Marvell DSA PCS conversion.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phylink: add support for PCS link change notifications
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:17 +0000 (09:42 +0100)]
net: phylink: add support for PCS link change notifications

Add a function, phylink_pcs_change() which can be used by PCs drivers
to notify phylink about changes to the PCS link state.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phylink: add pcs_pre_config()/pcs_post_config() methods
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:12 +0000 (09:42 +0100)]
net: phylink: add pcs_pre_config()/pcs_post_config() methods

Add hooks that are called before and after the mac_config() call,
which will be needed to deal with errata workarounds for the
Marvell 88e639x DSA switches.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phylink: add pcs_enable()/pcs_disable() methods
Russell King (Oracle) [Thu, 13 Jul 2023 08:42:07 +0000 (09:42 +0100)]
net: phylink: add pcs_enable()/pcs_disable() methods

Add phylink PCS enable/disable callbacks that will allow us to place
IEEE 802.3 register compliant PCS in power-down mode while not being
used.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ngbe: add Wake on Lan support
Mengyuan Lou [Thu, 13 Jul 2023 06:09:11 +0000 (14:09 +0800)]
net: ngbe: add Wake on Lan support

Implement ethtool_ops get_wol and set_wol.
Implement Wake-on-LAN support.

Wol requires hardware board support which use sub id
to identify.
Magic packets are checked by fw, for now just support
WAKE_MAGIC.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: ar9331: Use maple tree register cache
Mark Brown [Wed, 12 Jul 2023 23:45:58 +0000 (00:45 +0100)]
net: dsa: ar9331: Use maple tree register cache

We now have a regmap cache which uses a maple tree to store the register
state, this is a more modern data structure and the regmap level code
using it makes a number of assumptions better tuned for modern hardware
than those made by the rbtree cache type that the at9331 driver uses.
Switch the ar9331 driver to use the more modern data structure.

This should have minimal practical impact, it's mainly code
modernisation.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'sk-const'
David S. Miller [Fri, 14 Jul 2023 07:27:33 +0000 (08:27 +0100)]
Merge branch 'sk-const'

Guillaume Nault says:

====================
net: Mark the sk parameter of routing functions as 'const'.

The sk_getsecid security hook prevents the use of a const sk pointer in
several routing functions. Since this hook should only read sk data,
make its sk argument const (patch 1), then constify the sk parameter of
various routing functions (patches 2-4).

Build-tested with make allmodconfig.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agopptp: Constify the po parameter of pptp_route_output().
Guillaume Nault [Tue, 11 Jul 2023 13:06:26 +0000 (15:06 +0200)]
pptp: Constify the po parameter of pptp_route_output().

Make it explicit that this function doesn't modify the socket passed as
parameter.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: Constify the sk parameter of several helper functions.
Guillaume Nault [Tue, 11 Jul 2023 13:06:21 +0000 (15:06 +0200)]
ipv6: Constify the sk parameter of several helper functions.

icmpv6_flow_init(), ip6_datagram_flow_key_init() and ip6_mc_hdr() don't
need to modify their sk argument. Make that explicit using const.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv4: Constify the sk parameter of ip_route_output_*().
Guillaume Nault [Tue, 11 Jul 2023 13:06:14 +0000 (15:06 +0200)]
ipv4: Constify the sk parameter of ip_route_output_*().

These functions don't need to modify the socket, so let's allow the
callers to pass a const struct sock *.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosecurity: Constify sk in the sk_getsecid hook.
Guillaume Nault [Tue, 11 Jul 2023 13:06:08 +0000 (15:06 +0200)]
security: Constify sk in the sk_getsecid hook.

The sk_getsecid hook shouldn't need to modify its socket argument.
Make it const so that callers of security_sk_classify_flow() can use a
const struct sock *.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'net-stmmac-replace-boolean-fields-in-plat_stmmacenet_data-with-flags'
Jakub Kicinski [Fri, 14 Jul 2023 03:57:55 +0000 (20:57 -0700)]
Merge branch 'net-stmmac-replace-boolean-fields-in-plat_stmmacenet_data-with-flags'

Bartosz Golaszewski says:

====================
net: stmmac: replace boolean fields in plat_stmmacenet_data with flags

As suggested by Jose Abreu: let's drop all 12 boolean fields in
plat_stmmacenet_data and replace them with a common bitfield.
====================

Link: https://lore.kernel.org/r/20230710090001.303225-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: stmmac: replace the en_tx_lpi_clockgating field with a flag
Bartosz Golaszewski [Mon, 10 Jul 2023 09:00:01 +0000 (11:00 +0200)]
net: stmmac: replace the en_tx_lpi_clockgating field with a flag

Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20230710090001.303225-13-brgl@bgdev.pl
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: stmmac: replace the rx_clk_runs_in_lpi field with a flag
Bartosz Golaszewski [Mon, 10 Jul 2023 09:00:00 +0000 (11:00 +0200)]
net: stmmac: replace the rx_clk_runs_in_lpi field with a flag

Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20230710090001.303225-12-brgl@bgdev.pl
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: stmmac: replace the int_snapshot_en field with a flag
Bartosz Golaszewski [Mon, 10 Jul 2023 08:59:59 +0000 (10:59 +0200)]
net: stmmac: replace the int_snapshot_en field with a flag

Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20230710090001.303225-11-brgl@bgdev.pl
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: stmmac: replace the ext_snapshot_en field with a flag
Bartosz Golaszewski [Mon, 10 Jul 2023 08:59:58 +0000 (10:59 +0200)]
net: stmmac: replace the ext_snapshot_en field with a flag

Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20230710090001.303225-10-brgl@bgdev.pl
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: stmmac: replace the multi_msi_en field with a flag
Bartosz Golaszewski [Mon, 10 Jul 2023 08:59:57 +0000 (10:59 +0200)]
net: stmmac: replace the multi_msi_en field with a flag

Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20230710090001.303225-9-brgl@bgdev.pl
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: stmmac: replace the vlan_fail_q_en field with a flag
Bartosz Golaszewski [Mon, 10 Jul 2023 08:59:56 +0000 (10:59 +0200)]
net: stmmac: replace the vlan_fail_q_en field with a flag

Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20230710090001.303225-8-brgl@bgdev.pl
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>