David S. Miller [Wed, 17 Jan 2018 03:42:14 +0000 (22:42 -0500)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-01-17
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add initial BPF map offloading for nfp driver. Currently only
programs were supported so far w/o being able to access maps.
Offloaded programs are right now only allowed to perform map
lookups, and control path is responsible for populating the
maps. BPF core infrastructure along with nfp implementation is
provided, from Jakub.
2) Various follow-ups to Josef's BPF error injections. More
specifically that includes: properly check whether the error
injectable event is on function entry or not, remove the percpu
bpf_kprobe_override and rather compare instruction pointer
with original one, separate error-injection from kprobes since
it's not limited to it, add injectable error types in order to
specify what is the expected type of failure, and last but not
least also support the kernel's fault injection framework, all
from Masami.
3) Various misc improvements and cleanups to the libbpf Makefile.
That is, fix permissions when installing BPF header files, remove
unused variables and functions, and also install the libbpf.h
header, from Jesper.
4) When offloading to nfp JIT and the BPF insn is unsupported in the
JIT, then reject right at verification time. Also fix libbpf with
regards to ELF section name matching by properly treating the
program type as prefix. Both from Quentin.
5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler.
This is needed, for example, when building libfd from source as
bpftool doesn't supply a config.h for bfd.h. Fix from Jiong.
6) xdp_convert_ctx_access() is simplified since it doesn't need to
set target size during verification, from Jesper.
7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE
program types, from Roman.
8) Various functions in BPF cpumap were not declared static, from Wei.
9) Fix a double semicolon in BPF samples, from Luis.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 17 Jan 2018 00:18:11 +0000 (01:18 +0100)]
Merge branch 'bpf-libbpf-cleanups'
Jesper Dangaard Brouer says:
====================
This patchset contains some small improvements and cleanup for
the Makefile in tools/lib/bpf/.
It worries me that the libbpf.so shared library is not versioned,
but it not addressed in this patchset.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Tue, 16 Jan 2018 23:20:40 +0000 (00:20 +0100)]
libbpf: Makefile set specified permission mode
The third parameter to do_install was not used by $(INSTALL) command.
Fix this by only setting the -m option when the third parameter is supplied.
The use of a third parameter was introduced in commit
eb54e522a000 ("bpf:
install libbpf headers on 'make install'").
Without this change, the header files are install as executables files (755).
Fixes: eb54e522a000 ("bpf: install libbpf headers on 'make install'")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Tue, 16 Jan 2018 23:20:35 +0000 (00:20 +0100)]
libbpf: cleanup Makefile, remove unused elements
The plugin_dir_SQ variable is not used, remove it.
The function update_dir is also unused, remove it.
The variable $VERSION_FILES is empty, remove it.
These all originates from the introduction of the Makefile, and is likely a copy paste
from tools/lib/traceevent/Makefile.
Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Tue, 16 Jan 2018 23:20:30 +0000 (00:20 +0100)]
libbpf: install the header file libbpf.h
It seems like an oversight not to install the header file for libbpf,
given the libbpf.so + libbpf.a files are installed.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann [Wed, 17 Jan 2018 00:15:07 +0000 (01:15 +0100)]
Merge branch 'bpf-various-improvements'
Jakub Kicinski says:
====================
This series combines a number of random improvements ranging from
libbpf to nfp driver. NFP patches make better use of the verifier
log. There is a requested adjustment to the map offload code, and
a warning fix for a W=1 build to the disassembler. Quentin also
fixes the libbpf program type detection, while Jiong allows the use
of libbfd compiled from source.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Tue, 16 Jan 2018 23:51:50 +0000 (15:51 -0800)]
nfp: bpf: reject program on instructions unknown to the JIT compiler
If an eBPF instruction is unknown to the driver JIT compiler, we can
reject the program at verification time.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Tue, 16 Jan 2018 23:51:49 +0000 (15:51 -0800)]
nfp: bpf: print map lookup problems into verifier log
Use the verifier log to output error messages if map lookup
can't be offloaded.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Tue, 16 Jan 2018 23:51:48 +0000 (15:51 -0800)]
libbpf: fix string comparison for guessing eBPF program type
libbpf is able to deduce the type of a program from the name of the ELF
section in which it is located. However, the comparison is made on the
first n characters, n being determined with sizeof() applied to the
reference string (e.g. "xdp"). When such section names are supposed to
receive a suffix separated with a slash (e.g. "kprobe/"), using sizeof()
takes the final NUL character of the reference string into account,
which implies that both strings must be equal. Instead, the desired
behaviour would consist in taking the length of the string, *without*
accounting for the ending NUL character, and to make sure the reference
string is a prefix to the ELF section name.
Subtract 1 to the total size of the string for obtaining the length for
the comparison.
Fixes: 583c90097f72 ("libbpf: add ability to guess program type based on section name")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jiong Wang [Tue, 16 Jan 2018 23:51:47 +0000 (15:51 -0800)]
tools: bpftool: add -DPACKAGE when including bfd.h
bfd.h is requiring including of config.h except when PACKAGE or
PACKAGE_VERSION are defined.
/* PR 14072: Ensure that config.h is included first. */
#if !defined PACKAGE && !defined PACKAGE_VERSION
#error config.h must be included before this header
#endif
This check has been introduced since May-2012. It doesn't show up in bfd.h
on some Linux distribution, probably because distributions have remove it
when building the package.
However, sometimes the user might just build libfd from source code then
link bpftool against it. For this case, bfd.h will be original that we need
to define PACKAGE or PACKAGE_VERSION.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Tue, 16 Jan 2018 23:51:46 +0000 (15:51 -0800)]
bpf: annotate bpf_insn_print_t with __printf
Functions of type bpf_insn_print_t take printf-like format
string, mark the type accordingly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Tue, 16 Jan 2018 23:51:45 +0000 (15:51 -0800)]
bpf: offload: make bpf_offload_dev_match() reject host+host case
Daniel suggests it would be more logical for bpf_offload_dev_match()
to return false is either the program or the map are not offloaded,
rather than treating the both not offloaded case as a "matching
CPU/host device".
This makes no functional difference today, since verifier only calls
bpf_offload_dev_match() when one of the objects is offloaded.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Luis de Bethencourt [Tue, 16 Jan 2018 14:15:30 +0000 (14:15 +0000)]
samples/bpf: Fix trailing semicolon
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Wei Yongjun [Tue, 16 Jan 2018 11:27:05 +0000 (11:27 +0000)]
bpf: cpumap: make some functions static
Fixes the following sparse warnings:
kernel/bpf/cpumap.c:146:6: warning:
symbol '__cpu_map_queue_destructor' was not declared. Should it be static?
kernel/bpf/cpumap.c:225:16: warning:
symbol 'cpu_map_build_skb' was not declared. Should it be static?
kernel/bpf/cpumap.c:340:26: warning:
symbol '__cpu_map_entry_alloc' was not declared. Should it be static?
kernel/bpf/cpumap.c:398:6: warning:
symbol '__cpu_map_entry_free' was not declared. Should it be static?
kernel/bpf/cpumap.c:441:6: warning:
symbol '__cpu_map_entry_replace' was not declared. Should it be static?
kernel/bpf/cpumap.c:454:5: warning:
symbol 'cpu_map_delete_elem' was not declared. Should it be static?
kernel/bpf/cpumap.c:467:5: warning:
symbol 'cpu_map_update_elem' was not declared. Should it be static?
kernel/bpf/cpumap.c:505:6: warning:
symbol 'cpu_map_free' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Colin Ian King [Tue, 16 Jan 2018 10:22:50 +0000 (10:22 +0000)]
bnxt_en: don't update cpr->rx_bytes with uninitialized length len
Currently in the cases where cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP or
CMP_TYPE_RX_L2_TPA_END_CMP the exit path updates cpr->rx_bytes with an
uninitialized length len. Fix this by adding a new exit path that does
not update the cpr stats with the bogus length len and remove the unused
label next_rx_no_prod.
Detected by CoverityScan, CID#
1463807 ("Uninitialized scalar variable")
Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Mon, 15 Jan 2018 21:42:40 +0000 (00:42 +0300)]
net: delete /proc THIS_MODULE references
/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit
786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:
- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
+ if (de->proc_fops) {
+ if (S_ISREG(inode->i_mode))
+ inode->i_fop = &proc_reg_file_ops;
+ else
+ inode->i_fop = de->proc_fops;
+ }
VFS stopped pinning module at this point.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 15 Jan 2018 20:13:32 +0000 (12:13 -0800)]
net: remove prototype of qdisc_lookup_class()
Looks like qdisc_lookup_class() never existed in the tree
in the git era. Remove the prototype from the header.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 15 Jan 2018 16:56:28 +0000 (17:56 +0100)]
tipc: fix race condition at topology server receive
We have identified a race condition during reception of socket
events and messages in the topology server.
- The function tipc_close_conn() is releasing the corresponding
struct tipc_subscriber instance without considering that there
may still be items in the receive work queue. When those are
scheduled, in the function tipc_receive_from_work(), they are
using the subscriber pointer stored in struct tipc_conn, without
first checking if this is valid or not. This will sometimes
lead to crashes, as the next call of tipc_conn_recvmsg() will
access the now deleted item.
We fix this by making the usage of this pointer conditional on
whether the connection is active or not. I.e., we check the condition
test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg().
- Since the two functions may be running on different cores, the
condition test described above is not enough. tipc_close_conn()
may come in between and delete the subscriber item after the condition
test is done, but before tipc_conn_recv_msg() is finished. This
happens less frequently than the problem described above, but leads
to the same symptoms.
We fix this by using the existing sk_callback_lock for mutual
exclusion in the two functions. In addition, we have to move
a call to tipc_conn_terminate() outside the mentioned lock to
avoid deadlock.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Jan 2018 19:40:02 +0000 (14:40 -0500)]
Merge branch 'aquantia-next'
Igor Russkikh says:
====================
Aquantia atlantic driver update 2018/01
This patch is a set of cleanups and bugfixes in preparation to new
Aquantia hardware support.
Standard ARRAY_SIZE is now used through all the code,
some unused abstraction structures removed and cleaned up,
duplicate declarations removed.
Also two large declaration styling fixes:
- Hardware register set defines are lined up with kernel style
- Hardware access functions were not prefixed, now already
defined hw_atl prefix is used.
patch v2 changes:
- patch reorganized because of its big size. New HW support
will be submitted as a separate patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:22 +0000 (16:41 +0300)]
net: aquantia: Fix internal stats calculation on rx
skb len should be fetched before gro_receive - otherwise we may get
wrong or even outdated skb data.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:21 +0000 (16:41 +0300)]
net: aquantia: Prepend hw access functions declarations with prefix
Internal functions for registers and HW access were not prefixed.
This introduce noise in global kernel symbols. Here we add explicit prefix
'hw_atl' to all the HW access layer functions.
Alignment and styling were fixed as well.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:20 +0000 (16:41 +0300)]
net: aquantia: Fix register definitions to linux style
Original driver code had internal registers and masks declarations
in low case and without any prefix.
Here we make all these uppercase and add already used HW_ATL prefix
to recognize these.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:19 +0000 (16:41 +0300)]
net: aquantia: Eliminate aq_nic structure abstraction
aq_nic_s was hidden in aq_nic_internal.h, that made it difficult to access
nic fields and structures from other modules.
This change moves aq_nic_s struct into aq_nic.h and thus makes it available
to other driver modules, mainly pci module and hw related module.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:18 +0000 (16:41 +0300)]
net: aquantia: Simplify dependencies between pci modules
Eliminate useless passing of net_device_ops and ethtools_ops through
deep chain of calls.
Move all pci related code into aq_pci_func module.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:17 +0000 (16:41 +0300)]
net: aquantia: Add const qualifiers for hardware ops tables
Hardware operations and capabilities tables are constants and
never changed. Declare these as constants.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:16 +0000 (16:41 +0300)]
net: aquantia: Remove duplicate hardware descriptors declarations
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:15 +0000 (16:41 +0300)]
net: aquantia: Cleanup hardware access modules
Use direct aq_hw_s *self reference where possible
Eliminate useless abstraction PHAL, duplicated structures definitions,
Simplify nic config structure creation and management.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:14 +0000 (16:41 +0300)]
net: aquantia: Cleanup status flags accesses
Usage of aq_obj_s structure is noop, here we remove it
replacing access to flags filed directly.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 15 Jan 2018 13:41:13 +0000 (16:41 +0300)]
net: aquantia: Eliminate AQ_DIMOF, replace with ARRAY_SIZE
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Jan 2018 19:31:15 +0000 (14:31 -0500)]
Merge branch 'net-thunderx-add-support-for-PTP-clock'
Aleksey Makarov says:
====================
net: thunderx: add support for PTP clock
This series adds support for IEEE 1588 Precision Time Protocol
to Cavium ethernet driver.
The first patch adds support for the Precision Time Protocol Clocks and
Timestamping coprocessor (PTP) found on Cavium processors.
It registers a new PTP clock in the PTP core and provides functions
to use the counter in BGX, TNS, GTI, and NIC blocks.
The second patch introduces support for the PTP protocol to the
Cavium ThunderX ethernet driver.
v6:
- check if ptp_clock_register() returns NULL (Richard Cochran)
- fix doc comment for cavium_ptp_enable() (Richard Cochran)
- fix a function call formatting; use defined constant (Richard Cochran)
- add comments for `tx_ptp_skbs` and `ptp_skb` (Richard Cochran)
- use adjfine() instead of adjfreq() (Richard Cochran)
- add Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
v5: https://lkml.kernel.org/r/
20171211141435.2915-1-aleksey.makarov@cavium.com
- fix the file headers (add SPDX tags, remove advertisment) (Philippe Ombredanne)
- use "imply" instead of "select" (Richard Cochran)
- add some code in cavium_ptp_get() for the case when the PTP driver has not been
registered with the PTP core
v4: https://lkml.kernel.org/r/
20171208103442.19354-1-aleksey.makarov@cavium.com
- use IS_ENABLED. This fixes compilation of the ptp as a module (David Miller)
- select PTP_1588_CLOCK, not depend on it. This fixes a build warning.
- change u64 to __be64. This fixes the sparse warning
"warning: cast to restricted __be64"
- make nicvf_config_hwtstamp() static. This fixes the sparse warning
"warning: symbol 'nicvf_config_hwtstamp' was not declared. Should it be static?"
v3: https://lkml.kernel.org/r/
20171206133100.26436-1-aleksey.makarov@cavium.com
- rebase to net-next
v2: https://lkml.kernel.org/r/
20171117134909.8954-1-aleksey.makarov@cavium.com
- use readq()/writeq() in place of cavium_ptp_reg_read()/cavium_ptp_reg_write(),
don't use readq_relaxed()/writeq_relaxed() (David Daney)
v1: https://lkml.kernel.org/r/
20171107190704.15458-1-aleksey.makarov@cavium.com
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 15 Jan 2018 12:44:57 +0000 (18:44 +0600)]
net: thunderx: add timestamping support
This adds timestamping support for both receive and transmit
paths. On the receive side no filters are supported i.e either
all pkts will get a timestamp appended infront of the packet or none.
On the transmit side HW doesn't support timestamp insertion but
only generates a separate CQE with transmitted packet's timestamp.
Also HW supports only one packet at a time for timestamping on the
transmit side.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Radoslaw Biernacki [Mon, 15 Jan 2018 12:44:56 +0000 (18:44 +0600)]
net: add support for Cavium PTP coprocessor
This patch adds support for the Precision Time Protocol
Clocks and Timestamping hardware found on Cavium ThunderX
processors.
Signed-off-by: Radoslaw Biernacki <rad@semihalf.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 15 Jan 2018 10:43:03 +0000 (10:43 +0000)]
mlxsw: spectrum: qdiscs: Make function mlxsw_sp_qdisc_prio_unoffload static
Fixes the following sparse warning:
drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:464:1: warning:
symbol 'mlxsw_sp_qdisc_prio_unoffload' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Jan 2018 19:15:36 +0000 (14:15 -0500)]
Merge branch 'devlink-resource'
Jiri Pirko says:
====================
devlink: Add support for resource abstraction
Arkadi says:
Many of the ASIC's internal resources are limited and are shared between
several hardware procedures. For example, unified hash-based memory can
be used for many lookup purposes, like FDB and LPM. In many cases the user
can provide a partitioning scheme for such a resource in order to perform
fine tuning for his application. In such cases performing driver reload is
needed for the changes to take place, thus this patchset also adds support
for hot reload.
Such an abstraction can be coupled with devlink's dpipe interface, which
models the ASIC's pipeline as a graph of match/action tables. By modeling
the hardware resource object, and by coupling it to several dpipe tables,
further visibility can be achieved in order to debug ASIC-wide issues.
The proposed interface will provide the user the ability to understand the
limitations of the hardware, and receive notification regarding its occupancy.
Furthermore, monitoring the resource occupancy can be done in real-time and
can be useful in many cases.
---
v2->v3
- Mix/Max/Gran attributes.
- Add resource consumption per table.
- Change basic resource unit to 'entry'.
- ABI documentation.
v1->v2
- Add resource size attribute.
- Fix split bug.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:12 +0000 (08:59 +0100)]
mlxsw: documentation: Add resources ABI documentation
Add resources ABI documentation.
Signed-off-by: Arkadi Sharhsevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:11 +0000 (08:59 +0100)]
mlxsw: core: Add support for reload
Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.
In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:10 +0000 (08:59 +0100)]
mlxsw: pci: Add support for getting resource through devlink
Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:09 +0000 (08:59 +0100)]
mlxsw: spectrum: Add support for getting kvdl occupancy
Add support for getting the kvdl occupancy through the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:08 +0000 (08:59 +0100)]
mlxsw: spectrum_dpipe: Connect dpipe tables to resources
Connect current dpipe tables to resources. The tables are connected
in the following fashion:
1. IPv4 host -> KVD hash single
2. IPv6 host -> KVD hash double
3. Adjacency -> KVD linear
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:07 +0000 (08:59 +0100)]
mlxsw: spectrum: Register KVD resources with devlink
Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:06 +0000 (08:59 +0100)]
mlxsw: pci: Add support for performing bus reset
This is a preparation stage before introducing hot reload. During the
reload process the ASIC should be resetted by accessing the PCI BAR due
to unavailability of the mailbox/emad interfaces.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:05 +0000 (08:59 +0100)]
devlink: Add relation between dpipe and resource
The hardware processes which are modeled via dpipe commonly use some
internal hardware resources. Such relation can improve the understanding
of hardware limitations. The number of resource's unit consumed per
table's entry are also provided for each table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:04 +0000 (08:59 +0100)]
devlink: Add support for reload
Add support for performing driver hot reload.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:03 +0000 (08:59 +0100)]
devlink: Add support for resource abstraction
Add support for hardware resource abstraction over devlink. Each resource
is identified via id, furthermore it contains information regarding its
size and its related sub resources. Each resource can also provide its
current occupancy.
In some cases the sizes of some resources can be changed, yet for those
changes to take place a hot driver reload may be needed. The reload
capability will be introduced in the next patch.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Mon, 15 Jan 2018 07:59:02 +0000 (08:59 +0100)]
devlink: Add per devlink instance lock
This is a preparation before introducing resources and hot reload support.
Currently there are two global lock where one protects all devlink access,
and the second one protects devlink port access. This patch adds per devlink
instance lock which protects the internal members which are the sb/dpipe/
resource/ports. By introducing this lock the global devlink port lock can
be discarded.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 12 Jan 2018 22:17:34 +0000 (23:17 +0100)]
phy: realtek: use new helpers for paged register access
Make use of the new helpers for paged register access.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Jan 2018 17:25:11 +0000 (12:25 -0500)]
Merge branch 'phy-add-helpers-for-setting-clearing-bits-in-PHY-registers'
Heiner Kallweit says:
====================
phy: add helpers for setting/clearing bits in PHY registers
Based on the recent introduction of phy_modify add helpers for setting
and clearing bits in PHY registers. First user is phylib.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 12 Jan 2018 20:20:36 +0000 (21:20 +0100)]
phy: use new helpers phy_set_bits/phy_clear_bits in phylib
Use new helpers phy_set_bits / phy_clear_bits in phylib.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 12 Jan 2018 20:20:33 +0000 (21:20 +0100)]
phy: add helpers for setting/clearing bits in PHY registers
Based on the recent introduction of phy_modify add helpers for setting
and clearing bits in PHY registers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Mon, 15 Jan 2018 19:16:15 +0000 (19:16 +0000)]
bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs
Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs,
so the prog show command prints the numeric type value:
$ bpftool prog show
1: type 15 name bpf_prog1 tag
ac9f93dbfd6d9b74
loaded_at Jan 15/07:58 uid 0
xlated 96B jited 105B memlock 4096B
This patch defines the corresponding textual representation:
$ bpftool prog show
1: cgroup_device name bpf_prog1 tag
ac9f93dbfd6d9b74
loaded_at Jan 15/07:58 uid 0
xlated 96B jited 105B memlock 4096B
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Mon, 15 Jan 2018 21:13:34 +0000 (16:13 -0500)]
Merge tag 'linux-can-next-for-4.16-
20180105' of ssh://gitolite./linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2017-12-01,Re: pull-request: can-next
this is a pull request of 7 patches for net-next/master.
All patches are by me. Patch 6 is for the "can_raw" protocol and add
error checking to the bind() function. All other patches clean up the
coding style and remove unused parameters in various CAN drivers and
infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Jan 2018 20:09:46 +0000 (15:09 -0500)]
Merge branch 'sh_eth-simplify-TSU-initialization'
Sergei Shtylyov says:
====================
sh_eth: simplify TSU initialization
Here's a set of 2 patches against DaveM's 'net-next.git' repo. With those,
I'm somewhat simplifying the TSU init code in the driver probe() method...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 14 Jan 2018 17:47:44 +0000 (20:47 +0300)]
sh_eth: get Ether port # only when needed
The dual-port Ether configurations always have a shared TSU to e.g. pass
the packets between those ports. With the TSU init. code gathered under
the single *if*, we now can only get the port # from 'platform_device::id'
only when we actually need it (and not recalculate it each time)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 14 Jan 2018 17:47:43 +0000 (20:47 +0300)]
sh_eth: gather all TSU init code in one place
The sh_eth_cpu_data::chip_reset() method always resets using ARSTR and
this register is always located at the start of the TSU register region.
Therefore, we can only call this method if we know TSU is there and thus
simplify the probing code a bit...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Jan 2018 19:46:16 +0000 (14:46 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2018-01-13' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.16
Here are patches which have been accumulating over the holidays and
after the New Year. Business as usual and nothing special really
standing out.
But what's noteworthy here is that Larry Finger is stepping down as
the rtlwifi maintainer. He has been maintaining rtlwifi since it was
applied back in 2010 in commit
0c8173385e54 ("rtl8192ce: Add new
driver") and it has been no easy role trying to juggle between the
vendor, demanding upstream community and users. So big thank you to
Larry for all his efforts!
ath10k
* more preparation work for wcn3990 support
* add memory dump to firmware coredump files
wil6210
* support scheduled scan
* support 40-bit DMA addresses
qtnfmac
* support MAC address based access control
* support for radar detection and Channel Availibility Check (CAC)
mwifiex
* firmware coredump for usb devices
rtlwifi
* Larry Finger steps down as the maintainer and Ping-Ke Shih becomes
the new maintainer
* add debugfs interfaces to dump register and btcoex status, and also
write registers and h2c
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Fri, 12 Jan 2018 21:34:24 +0000 (22:34 +0100)]
net: ethernet: Add a driver for Gemini gigabit ethernet
The Gemini ethernet has been around for years as an out-of-tree
patch used with the NAS boxen and routers built on StorLink
SL3512 and SL3516, later Storm Semiconductor, later Cortina
Systems. These ASICs are still being deployed and brand new
off-the-shelf systems using it can easily be acquired.
The full name of the IP block is "Net Engine and Gigabit
Ethernet MAC" commonly just called "GMAC".
The hardware block contains a common TCP Offload Enginer (TOE)
that can be used by both MACs. The current driver does not use
it.
Cc: Tobias Waldvogel <tobias.waldvogel@gmail.com>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Fri, 12 Jan 2018 21:34:23 +0000 (22:34 +0100)]
net: ethernet: Add DT bindings for the Gemini ethernet
This adds the device tree bindings for the Gemini ethernet
controller. It is pretty straight-forward, using standard
bindings and modelling the two child ports as child devices
under the parent ethernet controller device.
Cc: devicetree@vger.kernel.org
Cc: Tobias Waldvogel <tobias.waldvogel@gmail.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Jan 2018 20:07:36 +0000 (22:07 +0200)]
ipv6: Fix build with gcc-4.4.5
Emil reported the following compiler errors:
net/ipv6/route.c: In function `rt6_sync_up`:
net/ipv6/route.c:3586: error: unknown field `nh_flags` specified in initializer
net/ipv6/route.c:3586: warning: missing braces around initializer
net/ipv6/route.c:3586: warning: (near initialization for `arg.<anonymous>`)
net/ipv6/route.c: In function `rt6_sync_down_dev`:
net/ipv6/route.c:3695: error: unknown field `event` specified in initializer
net/ipv6/route.c:3695: warning: missing braces around initializer
net/ipv6/route.c:3695: warning: (near initialization for `arg.<anonymous>`)
Problem is with the named initializers for the anonymous union members.
Fix this by adding curly braces around the initialization.
Fixes: 4c981e28d373 ("ipv6: Prepare to handle multiple netdev events")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Emil S Tantilov <emils.tantilov@gmail.com>
Tested-by: Emil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Fri, 12 Jan 2018 19:56:50 +0000 (20:56 +0100)]
tipc: fix bug during lookup of multicast destination nodes
In commit
232d07b74a33 ("tipc: improve groupcast scope handling") we
inadvertently broke non-group multicast transmission when changing the
parameter 'domain' to 'scope' in the function
tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding
change in the calling function, with the result that the lookup always
fails.
A closer anaysis reveals that this parameter is not needed at all.
Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the
current implementation this will be delivered to all matching
destinations except those which are published with NODE_SCOPE on other
nodes. Since such publications never will be visible on the sending node
anyway, it makes no sense to discriminate by scope at all.
We now remove this parameter altogether.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Fri, 12 Jan 2018 15:28:31 +0000 (18:28 +0300)]
net: Convert atomic_t net::count to refcount_t
Since net could be obtained from RCU lists,
and there is a race with net destruction,
the patch converts net::count to refcount_t.
This provides sanity checks for the cases of
incrementing counter of already dead net,
when maybe_get_net() has to used instead
of get_net().
Drivers: allyesconfig and allmodconfig are OK.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 11 Jan 2018 16:22:07 +0000 (14:22 -0200)]
sctp: removed unused var from sctp_make_auth
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 10 Jan 2018 20:21:31 +0000 (21:21 +0100)]
net: phy: remove parameter new_link from phy_mac_interrupt()
I see two issues with parameter new_link:
1. It's not needed. See also phy_interrupt(), works w/o this parameter.
phy_mac_interrupt sets the state to PHY_CHANGELINK and triggers the
state machine which then calls phy_read_status. And phy_read_status
updates the link state.
2. phy_mac_interrupt is used in interrupt context and getting the link
state may sleep (at least when having to access the PHY registers
via MDIO bus).
So let's remove it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Wed, 10 Jan 2018 20:08:50 +0000 (21:08 +0100)]
tipc: fix a potental access after delete in tipc_sk_join()
In commit
d12d2e12cec2 "tipc: send out join messages as soon as new
member is discovered") we added a call to the function tipc_group_join()
without considering the case that the preceding tipc_sk_publish() might
have failed, and the group item already deleted.
We fix this by returning from tipc_sk_join() directly after the
failed tipc_sk_publish.
Reported-by: syzbot+e3eeae78ea88b8d6d858@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Jan 2018 18:18:03 +0000 (13:18 -0500)]
Merge branch 'dsa-lan9303-check-error-value-from-devm_gpiod_get_optional'
Phil Reid says:
====================
net: dsa: lan9303: check error value from devm_gpiod_get_optional()
Errors need to be prograted back from probe.
Note: I have only compile tested the code as I don't have the hardware.
Egil Hjelmeland <privat@egil-hjelmeland.no> has tested it but I haven't
added at Test-by: wasn't in the standard form. Not sure if that's ok or
not.
Changes from v1:
- rebased on net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Reid [Wed, 10 Jan 2018 07:39:33 +0000 (15:39 +0800)]
net: dsa: lan9303: check error value from devm_gpiod_get_optional()
devm_gpiod_get_optional() can return an error in addition to a NULL ptr.
Check for error and propagate that to the probe function. Check return
value in probe. This will now handle EPROBE_DEFER for the reset gpio.
Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Reid [Wed, 10 Jan 2018 07:39:32 +0000 (15:39 +0800)]
net: dsa: lan9303: make lan9303_handle_reset() a void function
lan9303_handle_reset never returns anything other than success.
So there's not need for it to return an error code.
Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 12 Jan 2018 14:01:36 +0000 (15:01 +0100)]
net: phy: Have __phy_modify return 0 on success
__phy_modify would return the old value of the register before it was
modified. Thus on success, it does not return 0, but a positive value.
Thus functions using phy_modify, which is a wrapper around
__phy_modify, can start returning > 0 on success, rather than 0. As a
result, breakage has been noticed in various places, where 0 was
assumed.
Code inspection does not find any current location where the return of
the old value is currently used. So have __phy_modify return 0 on
success. When there is a real need for the old value, either a new
accessor can be added, or an additional parameter passed.
Fixes: fea23fb591cc ("net: phy: convert read-modify-write to phy_modify()")
Fixes: 2b74e5be17d2 ("net: phy: add phy_modify() accessor")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 14 Jan 2018 22:36:31 +0000 (23:36 +0100)]
Merge branch 'bpf-nfp-map-offload'
Jakub Kicinski says:
====================
This set adds support for creating maps on networking devices. BPF is
programs+maps, the pure program offload has been around for quite some
time, this patchset adds the map part of the equation.
Maps are allocated on the target device from the start. There is no
host copy when map is created on the device. Device maps are represented
by struct bpf_offloaded_map, regardless of type. Host programs can't
access such maps, access is only possible from a program also loaded
to the same device and/or via the BPF syscall.
Offloaded programs are currently only allowed to perform lookups,
control plane is responsible for populating the maps.
For brevity only infrastructure and basic NFP patches are included.
Target device reporting, netdevsim and tests will follow up as well as
some further optimizations to the NFP code.
v2:
- leave out the array maps, we will add them trivially later to avoid
merge conflicts with ongoing spectere&meltdown mitigations.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:17 +0000 (20:29 -0800)]
nfp: bpf: implement bpf map offload
Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:16 +0000 (20:29 -0800)]
nfp: bpf: add support for reading map memory
Map memory needs to use 40 bit addressing. Add handling of such
accesses. Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:15 +0000 (20:29 -0800)]
nfp: bpf: add verification and codegen for map lookups
Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.
New relocation types have to be added - for helpers and return
addresses.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:14 +0000 (20:29 -0800)]
nfp: bpf: add helpers for updating immediate instructions
Immediate loads are used to load the return address of a helper.
We need to be able to update those loads for relocations.
Immediate loads can be slightly more complex and spread over
two instructions in general, but here we only care about simple
loads of small (< 65k) constants, so complex cases are not handled.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:13 +0000 (20:29 -0800)]
nfp: bpf: parse function call and map capabilities
Parse helper function and supported map FW TLV capabilities.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:12 +0000 (20:29 -0800)]
nfp: bpf: implement helpers for FW map ops
Implement calls for FW map communication.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:11 +0000 (20:29 -0800)]
nfp: bpf: add basic control channel communication
For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.
Control messages are tagged with a 16 bit ID. Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:10 +0000 (20:29 -0800)]
nfp: bpf: add map data structure
To be able to split code into reasonable chunks we need to add
the map data structures already. Later patches will add code
piece by piece.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:09 +0000 (20:29 -0800)]
bpf: offload: add map offload infrastructure
BPF map offload follow similar path to program offload. At creation
time users may specify ifindex of the device on which they want to
create the map. Map will be validated by the kernel's
.map_alloc_check callback and device driver will be called for the
actual allocation. Map will have an empty set of operations
associated with it (save for alloc and free callbacks). The real
device callbacks are kept in map->offload->dev_ops because they
have slightly different signatures. Map operations are called in
process context so the driver may communicate with HW freely,
msleep(), wait() etc.
Map alloc and free callbacks are muxed via existing .ndo_bpf, and
are always called with rtnl lock held. Maps and programs are
guaranteed to be destroyed before .ndo_uninit (i.e. before
unregister_netdev() returns). Map callbacks are invoked with
bpf_devs_lock *read* locked, drivers must take care of exclusive
locking if necessary.
All offload-specific branches are marked with unlikely() (through
bpf_map_is_dev_bound()), given that branch penalty will be
negligible compared to IO anyway, and we don't want to penalize
SW path unnecessarily.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:08 +0000 (20:29 -0800)]
bpf: offload: factor out netdev checking at allocation time
Add a helper to check if netdev could be found and whether it
has .ndo_bpf callback. There is no need to check the callback
every time it's invoked, ndos can't reasonably be swapped for
a set without .ndp_bpf while program is loaded.
bpf_dev_offload_check() will also be used by map offload.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:07 +0000 (20:29 -0800)]
bpf: rename bpf_dev_offload -> bpf_prog_offload
With map offload coming, we need to call program offload structure
something less ambiguous. Pure rename, no functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:06 +0000 (20:29 -0800)]
bpf: add helper for copying attrs to struct bpf_map
All map types reimplement the field-by-field copy of union bpf_attr
members into struct bpf_map. Add a helper to perform this operation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:05 +0000 (20:29 -0800)]
bpf: hashtab: move checks out of alloc function
Use the new callback to perform allocation checks for hash maps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:04 +0000 (20:29 -0800)]
bpf: hashtab: move attribute validation before allocation
Number of attribute checks are currently performed after hashtab
is already allocated. Move them to be able to split them out to
the check function later on. Checks have to now be performed on
the attr union directly instead of the members of bpf_map, since
bpf_map will be allocated later. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski [Fri, 12 Jan 2018 04:29:03 +0000 (20:29 -0800)]
bpf: add map_alloc_check callback
.map_alloc callbacks contain a number of checks validating user-
-provided map attributes against constraints of a particular map
type. For offloaded maps we will need to check map attributes
without actually allocating any memory on the host. Add a new
callback for validating attributes before any memory is allocated.
This callback can be selectively implemented by map types for
sharing code with offloads, or simply to separate the logical
steps of validation and allocation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Sun, 14 Jan 2018 17:25:04 +0000 (12:25 -0500)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-12
This series contains updates to ixgbe, fm10k and net core.
Alex updates the driver to remove a duplicate MAC address check and
verifies that we have not run out of resources to configure a MAC rule
in our filter table. Also do not assume that dev->num_tc was populated
and configured with the driver, since it can be configured via mqprio
without any hardware coordination. Fixed the recording of stats for
MACVLAN in ixgbe and fm10k instead of recording the receive queue on
MACVLAN offloaded frames. When handling a MACVLAN offload, we should
be stopping/starting traffic on our own queues instead of the upper
devices transmit queues. Fixed possible race conditions with the
MACVLAN cleanup with the interface cleanup on shutdown. With the
recent fixes to ixgbe, we can cap the number of queues regardless of
accel_priv being in use or not, since the actual number of queues are
being reported via real_num_tx_queues.
Tony fixes up the kernel documentation for ixgbe and ixgbevf to resolve
warnings when W=1 is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 14 Jan 2018 17:21:12 +0000 (12:21 -0500)]
Merge branch 'mlxsw-Offload-PRIO-qdisc'
Jiri Pirko says:
====================
mlxsw: Offload PRIO qdisc
Nogah says:
Add an offload support for PRIO qdisc for mlxsw driver.
PRIO qdisc is being offloaded by using ndo_setup_tc. It has three
commands, to set or tune the qdisc, to remove it and to get its stats.
Like RED offloading, offloading this qdisc is not enforced on the driver
and determining its offload state is done in the dump action, when the
stats are being updated.
In the driver, offloading of PRIO is supported as root qdisc only. It
supports only priorities 0-7 (the range that is used by the current static
mapping of DSCP to skb prio and by 1:1 PCP values mapping) and up to 8
bands.
Patches 1-2 offload DSCP to priority mapping in the mlxsw_sp driver.
Patch 3 adds offload support for PRIO qdisc.
Patches 4-5 Add PRIO offload support in the mlxsw_sp driver.
---
v1->v2:
- Patch 1/5:
- Rewrite patch msg
- Patch 3/5:
- Send all the qstats in the replace command (and not just backlog)
- Patch 5/5:
- Align with the changes from 3/5
- Move backlog to the generic qdisc stats struct
- Delete extra newline
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Sun, 14 Jan 2018 11:33:17 +0000 (12:33 +0100)]
mlxsw: spectrum: qdiscs: Support stats for PRIO qdisc
Support basic stats for PRIO qdisc, which includes tx packets and bytes
count, drops count and backlog size. The rest of the stats are irrelevant
for this qdisc offload.
Since backlog is not only incremental but reflecting momentary value, in
case of a qdisc that stops being offloaded but is not destroyed, backlog
value needs to be updated about the un-offloading.
For that reason an unoffload function is being added to the ops struct.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Sun, 14 Jan 2018 11:33:16 +0000 (12:33 +0100)]
mlxsw: spectrum: qdiscs: Support PRIO qdisc offload
Add support for offloading PRIO qdisc as root qdisc.
The support is for up to 8 bands.
Routed packets priority is determined by the DSCP field with the default
translations. Bridged packets priority is determined by the PCP field, if
exist, otherwise it is set to 0.
Since both options have only priorities 0-7, higher priorities mapping are
being ignored.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Sun, 14 Jan 2018 11:33:15 +0000 (12:33 +0100)]
net: sch: prio: Add offload ability to PRIO qdisc
Add the ability to offload PRIO qdisc by using ndo_setup_tc.
There are three commands for PRIO offloading:
* TC_PRIO_REPLACE: handles set and tune
* TC_PRIO_DESTROY: handles qdisc destroy
* TC_PRIO_STATS: updates the qdiscs counters (given as reference)
Like RED qdisc, the indication of whether PRIO is being offloaded is being
set and updated as part of the dump function. It is so because the driver
could decide to offload or not based on the qdisc parent, which could
change without notifying the qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 14 Jan 2018 11:33:14 +0000 (12:33 +0100)]
mlxsw: spectrum_router: Configure default routing priority
When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 14 Jan 2018 11:33:13 +0000 (12:33 +0100)]
mlxsw: reg: add rdpm register
Add rdpm definition - router DSCP to priority mapping register.
This register will be utilized later to align the default mapping between
packet DSCP and switch-priority to the kernel's mapping between
packet priority and skb priority.
This is the first non-bit indexed register where the entries are arranged
in descending order, i.e., entry at offset 0 matches configuration for
dscp[63]. As a result, the item's step is converted into a signed variable
to support descending arrays [where step would be negative].
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 14 Jan 2018 17:08:45 +0000 (12:08 -0500)]
Merge branch 'dsa-mv88e6xxx-ATU-VTU-irq'
Andrew Lunn says:
====================
mv88e6xxx: ATU and VTU interrupts
Both the ATU and VTU of Mavell switches can generate interrupts when
violations occur. Trap this interrupts and print what violation
occurred.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 14 Jan 2018 01:32:45 +0000 (02:32 +0100)]
net: dsa: mv88e6xxx: Decode VTU problem interrupt
When there is a problem with the VTU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 14 Jan 2018 01:32:44 +0000 (02:32 +0100)]
net: dsa: mv88e6xxx: Decode ATU problem interrupt
When there is a problem with the ATU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Jan 2018 16:15:59 +0000 (17:15 +0100)]
mlxsw: spectrum_router: Add support for IPv6 non-equal-cost multipath
Since commit
eb789980d0aa ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.
Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.
This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Sat, 13 Jan 2018 21:13:44 +0000 (22:13 +0100)]
net: netsec: use dma_addr_t for storing dma address
On targets that have different sizes for phys_addr_t and dma_addr_t,
we get a type mismatch error:
drivers/net/ethernet/socionext/netsec.c: In function 'netsec_alloc_dring':
drivers/net/ethernet/socionext/netsec.c:970:9: error: passing argument 3 of 'dma_zalloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
The code is otherwise correct, as the address is never actually used as a
physical address but only passed into a DMA register. For consistently,
I'm changing the variable name as well, to clarify that this is a DMA
address.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Sat, 13 Jan 2018 01:33:39 +0000 (17:33 -0800)]
Merge branch 'error-injection'
Masami Hiramatsu says:
====================
Here are the 5th version of patches to moving error injection
table from kprobes. This version fixes a bug and update
fail-function to support multiple function error injection.
Here is the previous version:
https://patchwork.ozlabs.org/cover/858663/
Changes in v5:
- [3/5] Fix a bug that within_error_injection returns false always.
- [5/5] Update to support multiple function error injection.
Thank you,
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Masami Hiramatsu [Fri, 12 Jan 2018 17:56:03 +0000 (02:56 +0900)]
error-injection: Support fault injection framework
Support in-kernel fault-injection framework via debugfs.
This allows you to inject a conditional error to specified
function using debugfs interfaces.
Here is the result of test script described in
Documentation/fault-injection/fault-injection.txt
===========
# ./test_fail_function.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.
0227404 s, 46.1 MB/s
btrfs-progs v4.4
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID:
bfa96010-12e9-4360-aed0-
42eec7af5798
Node size: 16384
Sector size: 4096
Filesystem size: 1001.00MiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 58.00MiB
System: DUP 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 1001.00MiB /dev/loop2
mount: mount /dev/loop2 on /opt/tmpmnt failed: Cannot allocate memory
SUCCESS!
===========
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Masami Hiramatsu [Fri, 12 Jan 2018 17:55:33 +0000 (02:55 +0900)]
error-injection: Add injectable error types
Add injectable error types for each error-injectable function.
One motivation of error injection test is to find software flaws,
mistakes or mis-handlings of expectable errors. If we find such
flaws by the test, that is a program bug, so we need to fix it.
But if the tester miss input the error (e.g. just return success
code without processing anything), it causes unexpected behavior
even if the caller is correctly programmed to handle any errors.
That is not what we want to test by error injection.
To clarify what type of errors the caller must expect for each
injectable function, this introduces injectable error types:
- EI_ETYPE_NULL : means the function will return NULL if it
fails. No ERR_PTR, just a NULL.
- EI_ETYPE_ERRNO : means the function will return -ERRNO
if it fails.
- EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
(ERR_PTR) or NULL.
ALLOW_ERROR_INJECTION() macro is expanded to get one of
NULL, ERRNO, ERRNO_NULL to record the error type for
each function. e.g.
ALLOW_ERROR_INJECTION(open_ctree, ERRNO)
This error types are shown in debugfs as below.
====
/ # cat /sys/kernel/debug/error_injection/list
open_ctree [btrfs] ERRNO
io_ctl_init [btrfs] ERRNO
====
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Masami Hiramatsu [Fri, 12 Jan 2018 17:55:03 +0000 (02:55 +0900)]
error-injection: Separate error-injection from kprobe
Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.
Some differences has been made:
- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
feature. It is automatically enabled if the arch supports
error injection feature for kprobe or ftrace etc.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Masami Hiramatsu [Fri, 12 Jan 2018 17:54:33 +0000 (02:54 +0900)]
tracing/kprobe: bpf: Compare instruction pointer with original one
Compare instruction pointer with original one on the
stack instead using per-cpu bpf_kprobe_override flag.
This patch also consolidates reset_current_kprobe() and
preempt_enable_no_resched() blocks. Those can be done
in one place.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>