linux.git
7 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Thu, 11 Jan 2018 18:58:36 +0000 (13:58 -0500)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2018-01-11

Here's likely the last bluetooth-next pull request for the 4.16 kernel.

 - Added support for Bluetooth on 2015+ MacBook (Pro)
 - Fix to QCA Rome suspend/resume handling
 - Two new QCA_ROME USB IDs in btusb
 - A few other minor fixes

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: mdio-bcm-unimac: fix potential NULL dereference in unimac_mdio_probe()
Wei Yongjun [Thu, 11 Jan 2018 11:21:51 +0000 (11:21 +0000)]
net: phy: mdio-bcm-unimac: fix potential NULL dereference in unimac_mdio_probe()

platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.

This is detected by Coccinelle semantic patch.

@@
expression pdev, res, n, t, e, e1, e2;
@@

res = platform_get_resource(pdev, t, n);
+ if (!res)
+   return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: socionext: Fix error return code in netsec_netdev_open()
Wei Yongjun [Thu, 11 Jan 2018 11:21:38 +0000 (11:21 +0000)]
net: socionext: Fix error return code in netsec_netdev_open()

Fix to return error code -ENODEV from the of_phy_connect() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: socionext: include linux/io.h to fix build
Arnd Bergmann [Thu, 11 Jan 2018 10:36:24 +0000 (11:36 +0100)]
net: socionext: include linux/io.h to fix build

I ran into a randconfig build failure:

drivers/net/ethernet/socionext/netsec.c: In function 'netsec_probe':
drivers/net/ethernet/socionext/netsec.c:1583:17: error: implicit declaration of function 'devm_ioremap'; did you mean 'ioremap'? [-Werror=implicit-function-declaration]

Including linux/io.h directly fixes this.

Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 11 Jan 2018 16:59:44 +0000 (11:59 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-01-10

This series contains updates to i40e and i40evf only.

Alice adds the displaying of priority xon/xoff packet stats, since we
were already keeping track of them.  Based on the recent changes, bump
the driver versions.

Jake changes how the driver determines whether or not the device is
currently up to resolve the possible issue of freeing data structures
and other memory before they have been fully allocated.  Refactored
the driver to simplify the locking behavior and to consistently use
spinlocks instead of an overloaded bit lock to protect MAC and filter
lists.  Created a helper function which can convert the AdminQ link
speed definition into a virtchnl definition.

Colin Ian King cleans up a redundant variable initialization.

Alex cleans up the driver to stop clearing the pending bit array for
each vector manually, since it is prone to dropping an interrupt and
based on the hardware specs, the pending bit array will be cleared
automatically in MSI-X mode.  Cleaned up flags for promiscuous mode to
resolve an issue where enabling & disabling promiscuous mode on a VF
would leave us in a high polling rate for the adminq task.  Cleaned up
code that was prone to race issues.

Jingjing renames pipeline personalization profile (ppp) to dynamic
device personalization (ddp) because it was being confused with the
well known point to point protocol.  Also removed checks for "track_id"
being zero, since it is valid for it to be zero for profiles that do
not have any 'write' commands.

v2: cleaned up commit message for patch 12 based on feedback from Sergei
    Shtylyov and Alex Duyck
v3: dropped patch 15 from the original series while Mariusz Stachura
    works on the changes that Jakub Kicinski has suggested
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Don't handle RX interrupts when not up.
Nathan Fontenot [Wed, 10 Jan 2018 16:40:09 +0000 (10:40 -0600)]
ibmvnic: Don't handle RX interrupts when not up.

Initiating a kdump via the command line can cause a pending interrupt
to be handled by the ibmvnic driver when initializing the sub-CRQ
irqs during driver initialization.

NIP [d000000000ca34f0] ibmvnic_interrupt_rx+0x40/0xd0 [ibmvnic]
LR [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
Call Trace:
[c000000047fcfde0] [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
[c000000047fcfea0] [c00000000813317c] handle_irq_event_percpu+0x3c/0x90
[c000000047fcfee0] [c00000000813323c] handle_irq_event+0x6c/0xd0
[c000000047fcff10] [c0000000081385e0] handle_fasteoi_irq+0xf0/0x250
[c000000047fcff40] [c0000000081320a0] generic_handle_irq+0x50/0x80
[c000000047fcff60] [c000000008014984] __do_irq+0x84/0x1d0
[c000000047fcff90] [c000000008027564] call_do_irq+0x14/0x24
[c00000003c92af00] [c000000008014b70] do_IRQ+0xa0/0x120
[c00000003c92af50] [c000000008002594] hardware_interrupt_common+0x114/0x180

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: implement ndo_features_check
Ganesh Goudar [Wed, 10 Jan 2018 12:45:47 +0000 (18:15 +0530)]
cxgb4: implement ndo_features_check

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add support for vxlan segmentation offload
Ganesh Goudar [Wed, 10 Jan 2018 12:45:26 +0000 (18:15 +0530)]
cxgb4: add support for vxlan segmentation offload

add changes to t4_eth_xmit to enable vxlan segmentation
offload support.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: implement udp tunnel callbacks
Ganesh Goudar [Wed, 10 Jan 2018 12:45:08 +0000 (18:15 +0530)]
cxgb4: implement udp tunnel callbacks

Implement ndo_udp_tunnel_add and ndo_udp_tunnel_del
to support vxlan tunnelling.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add data structures to support vxlan
Ganesh Goudar [Wed, 10 Jan 2018 12:44:49 +0000 (18:14 +0530)]
cxgb4: add data structures to support vxlan

Add data structures and macros to be used in vxlan
offload.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sfc-support-25G-configuration-with-ethtool'
David S. Miller [Wed, 10 Jan 2018 21:23:39 +0000 (16:23 -0500)]
Merge branch 'sfc-support-25G-configuration-with-ethtool'

Edward Cree says:

====================
sfc: support 25G configuration with ethtool

Adds support for advertise bits beyond the 32-bit legacy masks, and plumbs in
 translation of the new 25/50/100G bits to/from MCDI.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosfc: add bits for 25/50/100G supported/advertised speeds
Edward Cree [Wed, 10 Jan 2018 18:00:25 +0000 (18:00 +0000)]
sfc: add bits for 25/50/100G supported/advertised speeds

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosfc: support the ethtool ksettings API properly so that 25/50/100G works
Edward Cree [Wed, 10 Jan 2018 18:00:14 +0000 (18:00 +0000)]
sfc: support the ethtool ksettings API properly so that 25/50/100G works

Store and handle ethtool link mode masks within the driver instead of
 just a single u32.  However, quite a significant amount of existing code
 wants to manipulate the masks directly, and thus now uses the first
 unsigned long (i.e. mask[0]) as though it were a legacy u32 mask.  This
 is ok because all the bits that code is interested in are in the first
 32 bits of the mask; but it might be a good idea to change them in
 future to use the proper bitmap API.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosfc: basic MCDI mapping of 25/50/100G link speeds
Edward Cree [Wed, 10 Jan 2018 17:59:59 +0000 (17:59 +0000)]
sfc: basic MCDI mapping of 25/50/100G link speeds

Only handles direct speed setting, not autoneg, because the driver is
 still trying to pretend it uses the legacy ethtool API which doesn't
 have advertised/supported bits for 25/50/100G.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-qdisc-refactoring'
David S. Miller [Wed, 10 Jan 2018 21:07:42 +0000 (16:07 -0500)]
Merge branch 'mlxsw-qdisc-refactoring'

Jiri Pirko says:

====================
mlxsw qdisc refactoring

This patchset refactors the qdisc handling in mlxsw driver in order to make
it more object oriented like.
It helps readability, laying the groundwork for the offloading of
additional qdiscs by the driver
This patchset also makes the qdiscs statistics more generic.

Patch 1 moves the qdiscs declaration to the spectrum_qdisc.c
Patches 2-3 clean the offloaded stats requests. Patch 2 changes the RED
generic stats struct to be sharable by other offloaded qdiscs. Patch 3
changes the xstats request to be like the stats. Note that these patches
are outside the driver scope.
Patches 4-5 clean the statistics related functions and structs within the
driver.
Patches 6-7 decrease the need for the same parameters to be sent to many
functions.
Patches 8-11 create a functions pointers struct, to make the qdiscs
handling more object oriented like.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Remove qdisc before setting a new one
Nogah Frankel [Wed, 10 Jan 2018 14:00:07 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Remove qdisc before setting a new one

If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Create a generic replace function
Nogah Frankel [Wed, 10 Jan 2018 14:00:06 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Create a generic replace function

Create a generic qdisc replace function.
For that goal, add three functions to the qdisc ops struct:
* check_params: Checks if the given parameters are offloadable.
* replace: Offload the given parameters.
* clean_stats: clean the qdisc stats for the offloaded qdisc.
integrate RED offloading into using the new internal replace API.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Create a generic destroy function
Nogah Frankel [Wed, 10 Jan 2018 14:00:05 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Create a generic destroy function

Add a destroy function to the qdiscs ops struct.
Create a generic qdisc destroy function, that clears the qdisc metadata as
well as calling the specific qdisc destroy function.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Add an ops struct
Nogah Frankel [Wed, 10 Jan 2018 14:00:04 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Add an ops struct

Qdisc struct have the Qdisc_class_ops struct.
This patch introduces the similar ops struct for the mlxsw_sp_qdisc_ops
struct. It allows better readability as well as code reusability for the
common parts of some functions like destroy.
The first operations to be added are the statistics getters.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Unite all handle checks
Nogah Frankel [Wed, 10 Jan 2018 14:00:03 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Unite all handle checks

Every qdisc op gets the qdisc handle ID as well as its location.  Each one
of them, beside replace, checks if the handle doesn't match the qdisc in
the given location, and if so, it returns without running the actual op.
Unite these checks to one comparison function and avoid sending the handle
id to these ops.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Add tclass number to the mlxsw_sp_qdisc
Nogah Frankel [Wed, 10 Jan 2018 14:00:02 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Add tclass number to the mlxsw_sp_qdisc

Tclass number is needed for most of the operations related to the qdisc in
the driver. Create a field for it in the mlxsw_sp_qdisc instead of passing
it to every function as parameter.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Make the clean stats function to be for RED only
Nogah Frankel [Wed, 10 Jan 2018 14:00:01 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Make the clean stats function to be for RED only

Improve readability by changing the clean stats function to handle only
RED. Qdiscs that will be offloaded in the future will have a clean stats
function of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Clean qdisc statistics structs
Nogah Frankel [Wed, 10 Jan 2018 14:00:00 +0000 (15:00 +0100)]
mlxsw: spectrum: qdiscs: Clean qdisc statistics structs

Clean RED offloaded stats and make them more generic by breaking the
generic qdisc stats to a struct of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sch: red: Change offloaded xstats to be incremental
Nogah Frankel [Wed, 10 Jan 2018 13:59:59 +0000 (14:59 +0100)]
net: sch: red: Change offloaded xstats to be incremental

Change the value of the xstats requested from the driver for offloaded RED
to be incremental, like the normal stats.
It increases consistency - if a qdisc stops being offloaded its xstats
don't change.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sch: red: Change the name of the stats struct to be generic
Nogah Frankel [Wed, 10 Jan 2018 13:59:58 +0000 (14:59 +0100)]
net: sch: red: Change the name of the stats struct to be generic

Change the name of the stats struct to be generic, so it could be used for
other qdisc offload, that will be added in the next patches.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: qdiscs: Move qdisc's declarations to its designated file
Nogah Frankel [Wed, 10 Jan 2018 13:59:57 +0000 (14:59 +0100)]
mlxsw: spectrum: qdiscs: Move qdisc's declarations to its designated file

Move all the qdisc related data from the spectrum.h to spectrum_qdisc.c.
Create an init and fini functions for the qdiscs.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Fix typo in firmware upgrade message
Ido Schimmel [Wed, 10 Jan 2018 13:56:54 +0000 (14:56 +0100)]
mlxsw: spectrum: Fix typo in firmware upgrade message

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: make local function tcp_recv_timestamp static
Wei Yongjun [Wed, 10 Jan 2018 07:43:15 +0000 (07:43 +0000)]
tcp: make local function tcp_recv_timestamp static

Fixes the following sparse warning:

net/ipv4/tcp.c:1736:6: warning:
 symbol 'tcp_recv_timestamp' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: fix error return code in mlx5e_alloc_rq()
Wei Yongjun [Wed, 10 Jan 2018 07:30:53 +0000 (07:30 +0000)]
net/mlx5e: fix error return code in mlx5e_alloc_rq()

Fix to return a negative error code from the xdp_rxq_info_reg() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 0ddf543226ac ("xdp/mlx5: setup xdp_rxq_info")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4vf: Fix SGE FL buffer initialization logic for 64K pages
Arjun Vynipadath [Wed, 10 Jan 2018 06:32:13 +0000 (12:02 +0530)]
cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages

We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and
the extant logic would flag that as an error. This was already fixed in
cxgb4 driver with "92ddcc7 cxgb4: Fix some small bugs in
t4_sge_init_soft() when our Page Size is 64KB".

Original Work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotuntap: fix for "tuntap: XDP transmission"
Stephen Rothwell [Wed, 10 Jan 2018 04:06:14 +0000 (15:06 +1100)]
tuntap: fix for "tuntap: XDP transmission"

Fixes: fc72d1d54dd9 ("tuntap: XDP transmission")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoi40e: track id can be 0
Jingjing Wu [Tue, 14 Nov 2017 12:00:48 +0000 (07:00 -0500)]
i40e: track id can be 0

track_id == 0 is valid for “read only” profiles when
profile does not have any “write” commands.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: change ppp name to ddp
Jingjing Wu [Tue, 14 Nov 2017 12:00:47 +0000 (07:00 -0500)]
i40e: change ppp name to ddp

PPP name was going to be confusing since PPP already means point
to point protocol. It is decided to change pipeline personalization
profile(ppp) to dynamic device personalization(ddp).

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40evf: Drop i40evf_fire_sw_int as it is prone to races
Alexander Duyck [Tue, 14 Nov 2017 12:00:46 +0000 (07:00 -0500)]
i40evf: Drop i40evf_fire_sw_int as it is prone to races

Having the interrupts firing while we are polling causes extra overhead and
isn't needed for most systems out there. If an interrupt is lost us
experiencing a 2s latency spike before recovering is still not acceptable
and masks the issue. We are better off just identifying systems that lose
interrupts and instead enable workarounds for those systems.

To that end I am dropping the code that was strobing the interrupts as
there is a narrow window where having them enabled can actually cause
race issues anyway where a few stray packets might get misses if the
interrupt is re-enabled and fires before we call napi_complete.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40evf: Clean-up flags for promisc mode to avoid high polling rate
Alexander Duyck [Tue, 14 Nov 2017 12:00:45 +0000 (07:00 -0500)]
i40evf: Clean-up flags for promisc mode to avoid high polling rate

If you enabled and disabled promiscuous mode on a VF you could easily put
it into a state where it would start firing interrupts on all queues at a
rate of 50+ interrupts per second even though there was no traffic present.
The issue seems to have been a stray admin queue feature flag set that was
leaving us in a high polling rate for the adminq task.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40evf: Do not clear MSI-X PBA manually
Alexander Duyck [Tue, 14 Nov 2017 12:00:44 +0000 (07:00 -0500)]
i40evf: Do not clear MSI-X PBA manually

We should not be clearing the pending bit array for each vector manually.
The documentation for the hardware states that when in MSI-X mode the
pending bit array will be cleared automatically. Us clearing it ourselves
just results in multiple opportunities for us to drop an interrupt.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: remove redundant initialization of read_size
Colin Ian King [Sun, 5 Nov 2017 13:04:29 +0000 (13:04 +0000)]
i40e: remove redundant initialization of read_size

Variable read_size is initialized and this value is never read, it is
instead set inside the do-loop, hence the initialization is redundant
and can be removed. Cleans up clang warning:

drivers/net/ethernet/intel/i40e/i40e_nvm.c:390:6: warning: Value stored
to 'read_size' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e/i40evf: Bump driver versions
Alice Michael [Fri, 27 Oct 2017 15:06:56 +0000 (11:06 -0400)]
i40e/i40evf: Bump driver versions

Bump the i40e driver from 2.1.14 to 2.3.2.

Bump the i40evf driver from 3.0.1 to 3.2.2

Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: add helper conversion function for link_speed
Jacob Keller [Fri, 27 Oct 2017 15:06:54 +0000 (11:06 -0400)]
i40e: add helper conversion function for link_speed

We introduced the virtchnl interface in order to have an interface for
talking to a virtual device driver which was host-driver agnostic. This
interface has its own definitions, including one for link speed.

The host driver has to talk to the virtchnl interface using these new
definitions in order to remain compatible. Today, the i40e link_speed
enumerations are value-exact matches for the virtchnl interface, so it
was originally decided to simply use a typecast.

However, this is unsafe, and makes it easier for future drivers to
continue this unsafe practice. There is nothing guaranteeing these
values are exact, and the type-cast would hide any compiler warning
which indicates the problem.

Rather than rely on this type cast, introduce a helper function which
can convert the AdminQ link speed definition into a virtchnl
definition. This can then be used by host driver implementations in
order to safely convert to the interface recognized by the virtual
functions.

If the link speed is not able to be represented by the virtchnl
definitions we'll report UNKNOWN which is the safest result.

This will ensure that should the driver specific link_speeds actual bit
definitions change, we do not report them incorrectly according to the
VF.

Additionally, this provides a better pattern for future drivers to copy,
as it is more likely a future device may not use the exact same bit-wise
definition as the current virtchnl interface.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: update VFs of link state after GET_VF_RESOURCES
Jacob Keller [Fri, 27 Oct 2017 15:06:53 +0000 (11:06 -0400)]
i40e: update VFs of link state after GET_VF_RESOURCES

We currently notify a VF of the link state after ENABLE_QUEUES, which is
the last thing a VF does after being configured. Guests may not actually
ENABLE_QUEUES until they get configured, and thus between driver load
and device configuration the VF may show inaccurate link status.

Fix this by also sending the link state after GET_VF_RESOURCES. Although
we could remove the message following ENABLE_QUEUES, it's not that
significant of a loss, so this patch just keeps both to ensure maximum
compatibility with guests on various OSes.

Specifically, without this patch guests running FreeBSD will display
inaccurate link state until the device is brought up. This is mostly
a cosmetic issue but can be confusing to system administrators.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40evf: hold the critical task bit lock while opening
Jacob Keller [Fri, 27 Oct 2017 15:06:52 +0000 (11:06 -0400)]
i40evf: hold the critical task bit lock while opening

If i40evf_open() is called quickly at the same time as a reset occurs
(such as via ethtool) it is possible for the device to attempt to open
while a reset is in progress. This occurs because the driver was not
holding the critical task bit lock during i40evf_open, nor was it
holding it around the call to i40evf_up_complete() in
i40evf_reset_task().

We didn't hold the lock previously because calls to i40evf_down() would
take the bit lock directly, and this would have caused a deadlock.

To avoid this, we'll move the bit lock handling out of i40evf_down() and
into the callers of this function. Additionally, we'll now hold the bit
lock over the entire set of steps when going up or down, to ensure that
we remain consistent.

Ultimately this causes us to serialize the transitions between down and
up properly, and avoid changing status while we're resetting.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40evf: release bit locks in reverse order
Jacob Keller [Fri, 27 Oct 2017 15:06:51 +0000 (11:06 -0400)]
i40evf: release bit locks in reverse order

Although not strictly necessary, it is customary to reverse the order in
which we release locks that we acquire. This helps preserve lock
ordering during future refactors, which can help avoid potential
deadlock situations.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40evf: use spinlock to protect (mac|vlan)_filter_list
Jacob Keller [Fri, 27 Oct 2017 15:06:50 +0000 (11:06 -0400)]
i40evf: use spinlock to protect (mac|vlan)_filter_list

Stop overloading the __I40EVF_IN_CRITICAL_TASK bit lock to protect the
mac_filter_list and vlan_filter_list. Instead, implement a spinlock to
protect these two lists, similar to how we protect the hash in the i40e
PF code.

Ensure that every place where we access the list uses the spinlock to
ensure consistency, and stop holding the critical section around blocks
of code which only need access to the macvlan filter lists.

This refactor helps simplify the locking behavior, and is necessary as
a future refactor to the __I40EVF_IN_CRITICAL_TASK would cause
a deadlock otherwise.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40evf: don't rely on netif_running() outside rtnl_lock()
Jacob Keller [Fri, 27 Oct 2017 15:06:49 +0000 (11:06 -0400)]
i40evf: don't rely on netif_running() outside rtnl_lock()

In i40evf_reset_task we use netif_running() to determine whether or not
the device is currently up. This allows us to properly free queue memory
and shut down things before we request the hardware reset.

It turns out that we cannot be guaranteed of netif_running() returning
false until the device is fully up, as the kernel core code sets
__LINK_STATE_START prior to calling .ndo_open. Since we're not holding
the rtnl_lock(), it's possible that the driver's i40evf_open handler
function is currently being called while we're resetting.

We can't simply hold the rtnl_lock() while checking netif_running() as
this could cause a deadlock with the i40evf_open() function.
Additionally, we can't avoid the deadlock by holding the rtnl_lock()
over the whole reset path, as this essentially serializes all resets,
and can cause massive delays if we have multiple VFs on a system.

Instead, lets just check our own internal state __I40EVF_RUNNING state
field. This allows us to ensure that the state is correct and is only
set after we've finished bringing the device up.

Without this change we might free data structures about device queues
and other memory before they've been fully allocated.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: display priority_xon and priority_xoff stats
Alice Michael [Fri, 27 Oct 2017 15:06:48 +0000 (11:06 -0400)]
i40e: display priority_xon and priority_xoff stats

Display some more stats that were already being counted, to help users
understand when priority xon/xoff packets are being sent/received

Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agonet: fix xdp_rxq_info build issue when CONFIG_SYSFS is not set
Jesper Dangaard Brouer [Tue, 9 Jan 2018 22:42:34 +0000 (23:42 +0100)]
net: fix xdp_rxq_info build issue when CONFIG_SYSFS is not set

The commit e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info")
removed some ifdef CONFIG_SYSFS in net/core/dev.c, but forgot to
remove the corresponding ifdef's in include/linux/netdevice.h.

Fixes: e817f85652c1 ("xdp: generic XDP handling of xdp_rxq_info")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell: mv88e6390 temperature sensor reading
Andrew Lunn [Tue, 9 Jan 2018 21:42:09 +0000 (22:42 +0100)]
net: phy: marvell: mv88e6390 temperature sensor reading

The internal PHYs in the mv88e6390 switch have a temperature sensor.
It uses a different register layout to other PHY currently supported.
It also has an errata, in that some reads of the sensor result in bad
values. So a number of reads need to be made, and the average taken.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-create-dynamic-software-irq-moderation-library'
David S. Miller [Wed, 10 Jan 2018 20:28:15 +0000 (15:28 -0500)]
Merge branch 'net-create-dynamic-software-irq-moderation-library'

Andy Gospodarek says:

====================
net: create dynamic software irq moderation library

This converts the dynamic interrupt moderation library from the mlx5e
driver into a library so it can be used by any driver.  The penultimate
patch in this set adds support for this new dynamic interrupt moderation
library in the bnxt_en driver and the last patch creates an entry in the
MAINTAINERS file for this library.

The main purpose of this code is to allow an administrator to make sure
that default coalesce settings are optimized for low latency, but
quickly adapt to handle high throughput/bulk traffic by altering how
much time passes before popping an interrupt.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for
their comments, etc on this set.

v4: Fix build breakage for VF representers noticed by kbuild test robot.
Thanks for being so courteous, kbuild test robot!

v3: bnxt_en fix from Michael Chan, comment suggestion from Vasundhara
Volam, and small mlx5e header file fix from Tal Gilboa.

v2: Spelling fixes from Stephen Hemminger, bnxt_en suggestions from
Michael Chan, spelling and formatting fixes from Or Gerlitz, and
spelling and mlx5e changes suggested by Tal Gilboa.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMAINTAINERS: add entry for Dynamic Interrupt Moderation
Andy Gospodarek [Tue, 9 Jan 2018 21:06:21 +0000 (16:06 -0500)]
MAINTAINERS: add entry for Dynamic Interrupt Moderation

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: add support for software dynamic interrupt moderation
Andy Gospodarek [Tue, 9 Jan 2018 21:06:20 +0000 (16:06 -0500)]
bnxt_en: add support for software dynamic interrupt moderation

This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/dim: use struct net_dim_sample as arg to net_dim
Andy Gospodarek [Tue, 9 Jan 2018 21:06:19 +0000 (16:06 -0500)]
net/dim: use struct net_dim_sample as arg to net_dim

Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Suggested-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Move dynamic interrupt coalescing code to include/linux
Andy Gospodarek [Tue, 9 Jan 2018 21:06:18 +0000 (16:06 -0500)]
net/mlx5e: Move dynamic interrupt coalescing code to include/linux

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Change Mellanox references in DIM code
Andy Gospodarek [Tue, 9 Jan 2018 21:06:17 +0000 (16:06 -0500)]
net/mlx5e: Change Mellanox references in DIM code

Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Move generic functions to new file
Andy Gospodarek [Tue, 9 Jan 2018 21:06:16 +0000 (16:06 -0500)]
net/mlx5e: Move generic functions to new file

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Move AM logic enums
Andy Gospodarek [Tue, 9 Jan 2018 21:06:15 +0000 (16:06 -0500)]
net/mlx5e: Move AM logic enums

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Remove rq references in mlx5e_rx_am
Andy Gospodarek [Tue, 9 Jan 2018 21:06:14 +0000 (16:06 -0500)]
net/mlx5e: Remove rq references in mlx5e_rx_am

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Move interrupt moderation forward declarations
Andy Gospodarek [Tue, 9 Jan 2018 21:06:13 +0000 (16:06 -0500)]
net/mlx5e: Move interrupt moderation forward declarations

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5e: Move interrupt moderation structs to new file
Andy Gospodarek [Tue, 9 Jan 2018 21:06:12 +0000 (16:06 -0500)]
net/mlx5e: Move interrupt moderation structs to new file

Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'ipv6-Add-support-for-non-equal-cost-multipath'
David S. Miller [Wed, 10 Jan 2018 20:14:45 +0000 (15:14 -0500)]
Merge branch 'ipv6-Add-support-for-non-equal-cost-multipath'

Ido Schimmel says:

====================
ipv6: Add support for non-equal-cost multipath

This set aims to add support for IPv6 non-equal-cost multipath routes.
The first three patches convert multipath selection to use the
hash-threshold method (RFC 2992) instead of modulo-N. The same method is
employed by the IPv4 routing code since commit 0e884c78ee19 ("ipv4: L3
hash-based multipath").

Unlike modulo-N, with hash-threshold only the flows near the region
boundaries are affected when a nexthop is added or removed. In addition,
it allows us to easily add support for non-equal-cost multipath in the
last patch by sizing the different regions according to the provided
weights.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: Add support for non-equal-cost multipath
Ido Schimmel [Tue, 9 Jan 2018 14:40:28 +0000 (16:40 +0200)]
ipv6: Add support for non-equal-cost multipath

The use of hash-threshold instead of modulo-N makes it trivial to add
support for non-equal-cost multipath.

Instead of dividing the multipath hash function's output space equally
between the nexthops, each nexthop is assigned a region size which is
proportional to its weight.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: Use hash-threshold instead of modulo-N
Ido Schimmel [Tue, 9 Jan 2018 14:40:27 +0000 (16:40 +0200)]
ipv6: Use hash-threshold instead of modulo-N

Now that each nexthop stores its region boundary in the multipath hash
function's output space, we can use hash-threshold instead of modulo-N
in multipath selection.

This reduces the number of checks we need to perform during lookup, as
dead and linkdown nexthops are assigned a negative region boundary. In
addition, in contrast to modulo-N, only flows near region boundaries are
affected when a nexthop is added or removed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: Use a 31-bit multipath hash
Ido Schimmel [Tue, 9 Jan 2018 14:40:26 +0000 (16:40 +0200)]
ipv6: Use a 31-bit multipath hash

The hash thresholds assigned to IPv6 nexthops are in the range of
[-1, 2^31 - 1], where a negative value is assigned to nexthops that
should not be considered during multipath selection.

Therefore, in a similar fashion to IPv4, we need to use the upper
31-bits of the multipath hash for multipath selection.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: Calculate hash thresholds for IPv6 nexthops
Ido Schimmel [Tue, 9 Jan 2018 14:40:25 +0000 (16:40 +0200)]
ipv6: Calculate hash thresholds for IPv6 nexthops

Before we convert IPv6 to use hash-threshold instead of modulo-N, we
first need each nexthop to store its region boundary in the hash
function's output space.

The boundary is calculated by dividing the output space equally between
the different active nexthops. That is, nexthops that are not dead or
linkdown.

The boundaries are rebalanced whenever a nexthop is added or removed to
a multipath route and whenever a nexthop becomes active or inactive.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovhost_net: batch used ring update in rx
Jason Wang [Tue, 9 Jan 2018 10:27:45 +0000 (18:27 +0800)]
vhost_net: batch used ring update in rx

This patch tries to batched used ring update during RX. This is pretty
fit for the case when guest is much faster (e.g dpdk based
backend). In this case, used ring is almost empty:

- we may get serious cache line misses/contending on both used ring
  and used idx.
- at most 1 packet could be dequeued at one time, batching in guest
  does not make much effect.

Update used ring in a batch can help since guest won't access the used
ring until used idx was advanced for several descriptors and since we
advance used ring for every N packets, guest will only need to access
used idx for every N packet since it can cache the used idx. To have a
better interaction for both batch dequeuing and dpdk batching,
VHOST_RX_BATCH was used as the maximum number of descriptors that
could be batched.

Test were done between two machines with 2.40GHz Intel(R) Xeon(R) CPU
E5-2630 connected back to back through ixgbe. Traffic were generated
on one remote ixgbe through MoonGen and measure the RX pps through
testpmd in guest when do xdp_redirect_map from local ixgbe to
tap. RX pps were increased from 3.05 Mpps to 4.00 Mpps (about 31%
improvement).

One possible concern for this is the implications for TCP (especially
latency sensitive workload). Result[1] does not show obvious changes
for most of the netperf test (RR, TX, and RX). And we do get some
improvements for RX on some specific size.

Guest RX:

size/sessions/+thu%/+normalize%
   64/     1/   +2%/   +2%
   64/     2/   +2%/   -1%
   64/     4/   +1%/   +1%
   64/     8/    0%/    0%
  256/     1/   +6%/   -3%
  256/     2/   -3%/   +2%
  256/     4/  +11%/  +11%
  256/     8/    0%/    0%
  512/     1/   +4%/    0%
  512/     2/   +2%/   +2%
  512/     4/    0%/   -1%
  512/     8/   -8%/   -8%
 1024/     1/   -7%/  -17%
 1024/     2/   -8%/   -7%
 1024/     4/   +1%/    0%
 1024/     8/    0%/    0%
 2048/     1/  +30%/  +14%
 2048/     2/  +46%/  +40%
 2048/     4/    0%/    0%
 2048/     8/    0%/    0%
 4096/     1/  +23%/  +22%
 4096/     2/  +26%/  +23%
 4096/     4/    0%/   +1%
 4096/     8/    0%/    0%
16384/     1/   -2%/   -3%
16384/     2/   +1%/   -4%
16384/     4/   -1%/   -3%
16384/     8/    0%/   -1%
65535/     1/  +15%/   +7%
65535/     2/   +4%/   +7%
65535/     4/    0%/   +1%
65535/     8/    0%/    0%

TCP_RR:

size/sessions/+thu%/+normalize%
    1/     1/    0%/   +1%
    1/    25/   +2%/   +1%
    1/    50/   +4%/   +1%
   64/     1/    0%/   -4%
   64/    25/   +2%/   +1%
   64/    50/    0%/   -1%
  256/     1/    0%/    0%
  256/    25/    0%/    0%
  256/    50/   +4%/   +2%

Guest TX:

size/sessions/+thu%/+normalize%
   64/     1/   +4%/   -2%
   64/     2/   -6%/   -5%
   64/     4/   +3%/   +6%
   64/     8/    0%/   +3%
  256/     1/  +15%/  +16%
  256/     2/  +11%/  +12%
  256/     4/   +1%/    0%
  256/     8/   +5%/   +5%
  512/     1/   -1%/   -6%
  512/     2/    0%/   -8%
  512/     4/   -2%/   +4%
  512/     8/   +6%/   +9%
 1024/     1/   +3%/   +1%
 1024/     2/   +3%/   +9%
 1024/     4/    0%/   +7%
 1024/     8/    0%/   +7%
 2048/     1/   +8%/   +2%
 2048/     2/   +3%/   -1%
 2048/     4/   -1%/  +11%
 2048/     8/   +3%/   +9%
 4096/     1/   +8%/   +8%
 4096/     2/    0%/   -7%
 4096/     4/   +4%/   +4%
 4096/     8/   +2%/   +5%
16384/     1/   -3%/   +1%
16384/     2/   -1%/  -12%
16384/     4/   -1%/   +5%
16384/     8/    0%/   +1%
65535/     1/    0%/   -3%
65535/     2/   +5%/  +16%
65535/     4/   +1%/   +2%
65535/     8/   +1%/   -1%

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'mlx5-updates-2018-01-08' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 10 Jan 2018 19:57:19 +0000 (14:57 -0500)]
Merge tag 'mlx5-updates-2018-01-08' of git://git./linux/kernel/git/saeed/linux

mlx5-updates-2018-01-08

Four patches from Or that add Hairpin support to mlx5:
===========================================================
From:  Or Gerlitz <ogerlitz@mellanox.com>

We refer the ability of NIC HW to fwd packet received on one port to
the other port (also from a port to itself) as hairpin. The application API
is based
on ingress tc/flower rules set on the NIC with the mirred redirect
action. Other actions can apply to packets during the redirect.

Hairpin allows to offload the data-path of various SW DDoS gateways,
load-balancers, etc to HW. Packets go through all the required
processing in HW (header re-write, encap/decap, push/pop vlan) and
then forwarded, CPU stays at practically zero usage. HW Flow counters
are used by the control plane for monitoring and accounting.

Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
All the flows that share <recv NIC, mirred NIC> are redirected through
the same hairpin pair. Currently, only header-rewrite is supported as a
packet modification action.

I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
functionality
on HW simulator, before it was avail in the FW so the driver code could be
tested early.
===========================================================

From Feras three patches that provide very small changes that allow IPoIB
to support RX timestamping for child interfaces, simply by hooking the mlx5e
timestamping PTP ioctl to IPoIB child interface netdev profile.

One patch from Gal to fix a spilling mistake.

Two patches from Eugenia adds drop counters to VF statistics
to be reported as part of VF statistics in netlink (iproute2) and
implemented them in mlx5 eswitch.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'hns3-next'
David S. Miller [Wed, 10 Jan 2018 19:55:52 +0000 (14:55 -0500)]
Merge branch 'hns3-next'

Peng Li says:

====================
code improvements in HNS3 driver

This patchset fixes 2 comments for community review.
[patch 1/2] reverts "net: hns3: Add packet statistics of netdev"
reported by Jakub Kicinski and David Miller.
[patch 2/2] reports the function type the same line with
hns3_nic_get_stats64, reported by Andrew Lunn.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: report the function type the same line with hns3_nic_get_stats64
Peng Li [Tue, 9 Jan 2018 06:50:59 +0000 (14:50 +0800)]
net: hns3: report the function type the same line with hns3_nic_get_stats64

The function type should be on the same line with the function
name, or it may cause display error if a patch edit the
function. There is am example following:
https://www.spinics.net/lists/netdev/msg476141.html

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRevert "net: hns3: Add packet statistics of netdev"
Peng Li [Tue, 9 Jan 2018 06:50:58 +0000 (14:50 +0800)]
Revert "net: hns3: Add packet statistics of netdev"

This reverts commit 8491000754796c838a0081c267f9dd54ad2ccba3.

It is duplicate to add statistics of netdev for ethtool -S.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'Socionext-Synquacer-NETSEC-driver'
David S. Miller [Wed, 10 Jan 2018 19:50:30 +0000 (14:50 -0500)]
Merge branch 'Socionext-Synquacer-NETSEC-driver'

Jassi Brar says:

====================
Socionext Synquacer NETSEC driver

Changes since v5
# Removed helper macros
# Removed 'inline' qualifier
# Changed multiline empty comment to single line
# Added 'clock-names' property in DT binding example
# Ignore 'clock-names' property in driver until f/ws in the wild are
  upgraded or we support instance that take in more than one clock.
# Rebased the patchset onto net-next

Changes since v4
        # Fixed ucode indexing as a word, instead of byte
        # Removed redundant clocks, keep only phy rate reference clock
          and expect it to be 'phy_ref_clk'

Changes since v3
        # Discard 'socionext,snq-mdio', and simply use 'mdio' subnode.
        # Use ioremap on ucode region as well, instead of memremap.

Changes since v2
        # Use 'mdio' subnode in DT bindings.
        # Use phy_interface_mode_is_rgmii(), instead of open coding the check.
        # Use readl/b with eeprom_base pointer.
        # Unregister mdio bus upon failure in probe.

Changes since v1
        # Switched from using memremap to ioremap
        # Implemented ndo_do_ioctl callback
        # Defined optional 'dma-coherent' DT property
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMAINTAINERS: Add entry for Socionext ethernet driver
Jassi Brar [Sat, 6 Jan 2018 14:14:56 +0000 (19:44 +0530)]
MAINTAINERS: Add entry for Socionext ethernet driver

Add entry for the Socionext Netsec controller driver and DT bindings.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: socionext: Add Synquacer NetSec driver
Jassi Brar [Sat, 6 Jan 2018 14:14:37 +0000 (19:44 +0530)]
net: socionext: Add Synquacer NetSec driver

This driver adds support for Socionext "netsec" IP Gigabit
Ethernet + PHY IP used in the Synquacer SC2A11 SoC.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodt-bindings: net: Add DT bindings for Socionext Netsec
Jassi Brar [Sat, 6 Jan 2018 14:14:19 +0000 (19:44 +0530)]
dt-bindings: net: Add DT bindings for Socionext Netsec

This patch adds documentation for Device-Tree bindings for the
Socionext NetSec Controller driver.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 10 Jan 2018 19:38:06 +0000 (14:38 -0500)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2018-01-09

This series contains updates to ixgbe and ixgbevf only.

Emil fixes an issue with "wake on LAN"(WoL) where we need to ensure we
enable the reception of multicast packets so that WoL works for IPv6
magic packets.  Cleaned up code no longer needed with the update to
adaptive ITR.

Paul update the driver to advertise the highest capable link speed
when a module gets inserted.  Also extended the displaying of firmware
version to include the iSCSI and OEM block in the EEPROM to better
identify firmware versions/images.

Tonghao Zhang cleans up a code comment that no longer applies since
InterruptThrottleRate has been removed from the driver.

Alex fixes SR-IOV and MACVLAN offload interaction, where the MACVLAN
offload was incorrectly configuring several filters with the wrong
pool value which resulted in MACLVAN interfaces not being able to
receive traffic that had to pass over the physical interface.  Fixed
transmit hangs and dropped receive frames when the number of VFs
changed.  Added support for RSS on MACVLAN pools for X550 devices.
Fixed up the MACVLAN limitations so we can now support 63 offloaded
devices.  Cleaned up MACVLAN code that is no longer needed with the
recent changes and fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoBluetooth: btbcm: Fix sleep mode struct ordering
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: btbcm: Fix sleep mode struct ordering

According to the documentation for Laird SD40 radio modules (which use
the BCM4329 chipset), the order of the Enable_BREAK_To_Host and
Pulsed_HOST_WAKE parameters in the sleep mode struct is reversed
vis-à-vis our struct declaration.  See page 46 of this PDF:

http://cdn.lairdtech.com/home/brandworld/files/Application%20Note%20-%2040%20Series%20Bluetooth.pdf

The documentation is dated Oct 2015, so fairly recent, making it appear
more likely that the documentation is correct and our code is wrong.
Amend our code to be in congruence with the documentation.

Cc: Sue White <sue.white@lairdtech.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Sleep instead of spinning
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Sleep instead of spinning

The driver calls mdelay(15) in the ->suspend, ->resume, ->runtime_suspend
and ->runtime_resume hook, however spinning for such a long period of
time is discouraged as per Documentation/timers/timers-howto.txt.

The use of mdelay() seems unnecessary, it is allowed to sleep in the
system sleep and runtime PM hooks (with the exception of ->suspend_noirq
and ->resume_noirq) and the driver itself also does not rely on a
non-sleeping ->runtime_resume as the only place where a synchronous
resume is performed, in bcm_dequeue(), is called from a work item in
hci_ldisc.c and hci_serdev.c.

So replace the mdelay(15) with msleep(15).

Note that the delay is inserted after asserting or deasserting the
device wake pin, but in bcm_gpio_set_power() that pin is asserted or
deasserted *without* observing a delay.  It is thus unclear if the delay
is necessary at all.  It is likewise unclear why it is exactly 15 ms,
the commit introducing it, 118612fb9165 ("Bluetooth: hci_bcm: Add
suspend/resume PM functions"), does not provide a rationale.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Suggested-and-reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Silence IRQ printk
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Silence IRQ printk

The host wake IRQ is optional, but if none is found, "BCM irq: -22" is
logged which may irritate users.  This is really a debug message, so use
dev_dbg() instead of dev_info().  If users are interested in the IRQ,
they can always consult /proc/interrupts.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Support Apple GPIO handling
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Support Apple GPIO handling

Enable Bluetooth on the following Macs which provide custom ACPI methods
to toggle the GPIOs for device wake and shutdown instead of accessing
the pins directly:

    MacBook8,1     2015  12"
    MacBook9,1     2016  12"
    MacBook10,1    2017  12"
    MacBookPro13,1 2016  13"
    MacBookPro13,2 2016  13" with Touch Bar
    MacBookPro13,3 2016  15" with Touch Bar
    MacBookPro14,1 2017  13"
    MacBookPro14,2 2017  13" with Touch Bar
    MacBookPro14,3 2017  15" with Touch Bar

On the MacBook8,1 Bluetooth is muxed with a second device (a debug port
on the SSD) under the control of PCH GPIO 36.  Because serdev cannot
deal with multiple slaves yet, it is currently necessary to patch the
DSDT and remove the SSDC device.

The custom ACPI methods are called:

    BTLP (Low Power) takes one argument, toggles device wake GPIO
    BTPU (Power Up) tells SMC to drive shutdown GPIO high
    BTPD (Power Down) tells SMC to drive shutdown GPIO low
    BTRS (Reset) calls BTPD followed by BTPU
    BTRB unknown, not present on all MacBooks

Search for the BTLP, BTPU and BTPD methods on ->probe and cache them in
struct bcm_device if the machine is a Mac.

Additionally, set the init_speed based on a custom device property
provided by Apple in lieu of _CRS resources.  The Broadcom UART's speed
is fixed on Apple Macs:  Any attempt to change it results in Bluetooth
status code 0x0c and bcm_set_baudrate() thus always returns -EBUSY.
By setting only the init_speed and leaving oper_speed at zero, we can
achieve that the host UART's speed is adjusted but the Broadcom UART's
speed is left as is.

The host wake pin goes into the SMC which handles it independently
of the OS, so there's no IRQ for it.

Thanks to Ronald Tschalär who did extensive debugging and testing of
this patch and contributed fixes.

ACPI snippet containing the custom methods and device properties
(taken from a MacBook8,1):

    Method (BTLP, 1, Serialized)
    {
        If (LEqual (Arg0, 0x00))
        {
            Store (0x01, GD54) /* set PCH GPIO 54 direction to input */
        }

        If (LEqual (Arg0, 0x01))
        {
            Store (0x00, GD54) /* set PCH GPIO 54 direction to output */
            Store (0x00, GP54) /* set PCH GPIO 54 value to low */
        }
    }

    Method (BTPU, 0, Serialized)
    {
        Store (0x01, \_SB.PCI0.LPCB.EC.BTPC)
        Sleep (0x0A)
    }

    Method (BTPD, 0, Serialized)
    {
        Store (0x00, \_SB.PCI0.LPCB.EC.BTPC)
        Sleep (0x0A)
    }

    Method (BTRS, 0, Serialized)
    {
        BTPD ()
        BTPU ()
    }

    Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
    {
        If (LEqual (Arg0, ToUUID ("a0b5b7c6-1318-441c-b0c9-fe695eaf949b")))
        {
            Store (Package (0x08)
                {
                    "baud",
                    Buffer (0x08)
                    { 0xC0, 0xC6, 0x2D, 0x00, 0x00, 0x00, 0x00, 0x00 },

                    "parity",
                    Buffer (0x08)
                    { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },

                    "dataBits",
                    Buffer (0x08)
                    { 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },

                    "stopBits",
                    Buffer (0x08)
                    { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }
                }, Local0)
            DTGP (Arg0, Arg1, Arg2, Arg3, RefOf (Local0))
            Return (Local0)
        }
        Return (0x00)
    }

Link: https://github.com/Dunedan/mbp-2016-linux/issues/29
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=110901
Reported-by: Leif Liddy <leif.liddy@gmail.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Tested-by: Max Shavrick <mxms@me.com> [MacBook8,1]
Tested-by: Leif Liddy <leif.liddy@gmail.com> [MacBook9,1]
Tested-by: Daniel Roschka <danielroschka@phoenitydawn.de> [MacBookPro13,2]
Tested-by: Ronald Tschalär <ronald@innovation.ch> [MacBookPro13,3]
Tested-by: Peter Y. Chuang <peteryuchuang@gmail.com> [MacBookPro14,1]
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Handle errors properly
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Handle errors properly

A significant portion of this driver lacks error handling.  As a first
step, add error paths to bcm_gpio_set_power(), bcm_open(), bcm_close(),
bcm_suspend_device(), bcm_resume_device(), bcm_resume(), bcm_probe() and
bcm_serdev_probe().  (I've also scrutinized bcm_suspend() but think it's
fine as is.)

Those are all the functions accessing the device wake and shutdown GPIO.
On Apple Macs the pins are accessed through ACPI methods, which may fail
for various reasons, hence proper error handling is necessary.  Non-Macs
access the pins directly, which may fail as well but the GPIO core does
not yet pass back errors to consumers.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Add callbacks to toggle GPIOs
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Add callbacks to toggle GPIOs

MacBooks provides custom ACPI methods to toggle the GPIOs for device
wake and shutdown instead of accessing the pins directly.  Prepare for
their support by adding callbacks to toggle the GPIOs, which on non-Macs
do nothing more but call gpiod_set_value().

No functional change intended.

Suggested-and-reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Document struct bcm_device
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Document struct bcm_device

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Invalidate IRQ on request failure
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Invalidate IRQ on request failure

If devm_request_irq() fails, the driver bails out of bcm_request_irq()
but continues to ->setup the device (because the IRQ is optional).

The driver subsequently calls devm_free_irq(), enable_irq_wake() and
disable_irq_wake() on the IRQ even though requesting it failed.

Avoid by invalidating the IRQ on request failure.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Fix unbalanced pm_runtime_disable()
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Fix unbalanced pm_runtime_disable()

On ->setup, pm_runtime_enable() is only called if a valid IRQ was found,
but on ->close(), pm_runtime_disable() is called unconditionally.
Disablement of runtime PM is recorded in a counter, so every
pm_runtime_disable() needs to be balanced.  Fix it.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Reported-and-reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Fix race on close
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Fix race on close

Upon ->close, the driver powers the Bluetooth controller down, deasserts
the device wake pin, updates the runtime PM status to "suspended" and
finally frees the IRQ.

Because the IRQ is freed last, a runtime resume can take place after
the controller was powered down.  The impact is not grave, the worst
thing that can happen is that the device wake pin is reasserted (should
have no effect while the regulator is off) and that setting the runtime
PM status to "suspended" does not reflect reality.

Still, it's wrong, so free the IRQ first.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Clean up unnecessary #ifdef
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Clean up unnecessary #ifdef

pm_runtime_disable() and pm_runtime_set_suspended() are replaced with
empty inlines if CONFIG_PM is disabled, so there's no need to #ifdef
them.

device_init_wakeup() is likewise replaced with an inline, though it's
not empty, but it and devm_free_irq() can be made conditional on
IS_ENABLED(CONFIG_PM), which is preferable to #ifdef as per section 20
of Documentation/process/coding-style.rst.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Validate IRQ before using it
Ronald Tschalär [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Validate IRQ before using it

The ->close, ->suspend and ->resume hooks assume presence of a valid IRQ
if the device is wakeup capable.  However it's entirely possible that
wakeup was enabled by some other entity besides this driver and in this
case the user will get a WARN splat if no valid IRQ was found.  Avoid by
checking if the IRQ is valid, i.e. > 0.

Case in point:  On recent MacBook Pros, the Bluetooth device lacks an
IRQ (because host wakeup is handled by the SMC, independently of the
operating system), but it does possess a _PRW method (which specifies
the SMC's GPE as wake event).  The ACPI core therefore automatically
marks the physical Bluetooth device wakeup capable upon binding it to
its ACPI companion:

device_set_wakeup_capable+0x96/0xb0
acpi_bind_one+0x28a/0x310
acpi_platform_notify+0x20/0xa0
device_add+0x215/0x690
serdev_device_add+0x57/0xf0
acpi_serdev_add_device+0xc9/0x110
acpi_ns_walk_namespace+0x131/0x280
acpi_walk_namespace+0xf5/0x13d
serdev_controller_add+0x6f/0x110
serdev_tty_port_register+0x98/0xf0
tty_port_register_device_attr_serdev+0x3a/0x70
uart_add_one_port+0x268/0x500
serial8250_register_8250_port+0x32e/0x490
dw8250_probe+0x46c/0x720
platform_drv_probe+0x35/0x90
driver_probe_device+0x300/0x450
bus_for_each_drv+0x67/0xb0
__device_attach+0xde/0x160
bus_probe_device+0x9c/0xb0
device_add+0x448/0x690
platform_device_add+0x10e/0x260
mfd_add_device+0x392/0x4c0
mfd_add_devices+0xb1/0x110
intel_lpss_probe+0x2a9/0x610 [intel_lpss]
intel_lpss_pci_probe+0x7a/0xa8 [intel_lpss_pci]

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
[lukas: fix up ->suspend and ->resume as well, add commit message]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoBluetooth: hci_bcm: Mandate presence of shutdown and device wake GPIO
Lukas Wunner [Wed, 10 Jan 2018 15:32:10 +0000 (16:32 +0100)]
Bluetooth: hci_bcm: Mandate presence of shutdown and device wake GPIO

Commit 0395ffc1ee05 ("Bluetooth: hci_bcm: Add PM for BCM devices")
amended this driver to request a shutdown and device wake GPIO on probe,
but mandated that only one of them need to be present:

/* Make sure at-least one of the GPIO is defined and that
 * a name is specified for this instance
 */
if ((!dev->device_wakeup && !dev->shutdown) || !dev->name) {
dev_err(&pdev->dev, "invalid platform data\n");
return -EINVAL;
}

However the same commit added a call to bcm_gpio_set_power() to the
->probe hook, which unconditionally accesses *both* GPIOs.  Luckily,
the resulting NULL pointer deref was never reported, suggesting there's
no machine where either GPIO is missing.

Commit 8a92056837fd ("Bluetooth: hci_bcm: Add (runtime)pm support to the
serdev driver") removed the check whether at least one of the GPIOs is
present without specifying a reason.

Because commit 62aaefa7d038 ("Bluetooth: hci_bcm: improve use of gpios
API") refactored the driver to use devm_gpiod_get_optional() instead of
devm_gpiod_get(), one is now tempted to believe that the driver doesn't
require *any* of the two GPIOs.

Which is wrong, the driver still requires both GPIOs to avoid a NULL
pointer deref.  To this end, establish the status quo ante and request
the GPIOs with devm_gpiod_get() again.  Bail out of ->probe if either
of them is missing.

Oddly enough, whereas bcm_gpio_set_power() accesses the device wake pin
unconditionally, bcm_suspend_device() and bcm_resume_device() do check
for its presence before accessing it.  Those checks are superfluous,
so remove them.

Cc: Frédéric Danis <frederic.danis.oss@gmail.com>
Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
7 years agoMerge branch 'r8169-improve-runtime-pm'
David S. Miller [Tue, 9 Jan 2018 17:38:57 +0000 (12:38 -0500)]
Merge branch 'r8169-improve-runtime-pm'

Heiner Kallweit says:

====================
r8169: improve runtime pm

On my system with two network ports I found that runtime PM didn't
suspend the unused port. Therefore I checked runtime pm in this driver
in somewhat more detail and this series improves runtime pm in general
and solves the mentioned issue.

Tested on a system with RTL8168evl (MAC version 34).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8169: improve runtime pm in general and suspend unused ports
Heiner Kallweit [Mon, 8 Jan 2018 20:39:13 +0000 (21:39 +0100)]
r8169: improve runtime pm in general and suspend unused ports

So far rpm doesn't cover cases like unused ports which are never
brought up. If they are active at probe time they remain in this state.
Included in this patch:

- Let the idle notification check whether we can suspend and let it
  schedule the suspend. This way we don't need to have calls to
  pm_schedule_suspend in different places.

- At the end of rtl_open and rtl_init_one send an idle notification
  to allow suspending if the link is down. If a cable is plugged in
  aneg is finished before the suspend timer expires and the suspend
  request is cancelled.

- Change rtl8169_runtime_suspend to power down the chip if the
  interface is down.

Successfully tested on a RTL8168evl (mac version 34).

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8169: improve runtime pm in rtl8169_check_link_status
Heiner Kallweit [Mon, 8 Jan 2018 20:39:07 +0000 (21:39 +0100)]
r8169: improve runtime pm in rtl8169_check_link_status

This patch partially reverts commit e4fbce740f07 "r8169: Fix runtime
power management" from 2010. At that time the suspend delay was 100ms
and therefore suspending happened during initial aneg. Currently
suspend delay is 5s, so suspend starts after aneg and the issue
doesn't exist any longer. On my system aneg takes almost 3s, to be on
the safe side let's increase the suspend delay to 10s.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8169: remove unneeded rpm ops in rtl_shutdown
Heiner Kallweit [Mon, 8 Jan 2018 20:39:02 +0000 (21:39 +0100)]
r8169: remove unneeded rpm ops in rtl_shutdown

This patch reverts commit 2a15cd2ff488 "r8169: runtime resume before
shutdown" from 2012. Few months after this change the underlying issue
was solved in the PCI core with commit 3ff2de9ba1a2 "PCI/PM: Resume
device before shutdown".

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'tipc-improvements-to-group-messaging'
David S. Miller [Tue, 9 Jan 2018 17:35:59 +0000 (12:35 -0500)]
Merge branch 'tipc-improvements-to-group-messaging'

Jon Maloy says:

====================
tipc: improvements to group messaging

We make a number of simplifications and improvements to the group
messaging service. They aim at readability/maintainability of the code
as well as scalability.

The series is based on commit f9c935db8086 ("tipc: fix problems with
multipoint-to-point flow control) which has been applied to 'net' but
not yet to 'net-next'.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: improve poll() for group member socket
Jon Maloy [Mon, 8 Jan 2018 20:03:31 +0000 (21:03 +0100)]
tipc: improve poll() for group member socket

The current criteria for returning POLLOUT from a group member socket is
too simplistic. It basically returns POLLOUT as soon as the group has
external destinations, something obviously leading to a lot of spinning
during destination congestion situations. At the same time, the internal
congestion handling is unnecessarily complex.

We now change this as follows.

- We introduce an 'open' flag in  struct tipc_group. This flag is used
  only to help poll() get the setting of POLLOUT right, and *not* for
  congeston handling as such. This means that a user can choose to
  ignore an  EAGAIN for a destination and go on sending messages to
  other destinations in the group if he wants to.

- The flag is set to false every time we return EAGAIN on a send call.

- The flag is set to true every time any member, i.e., not necessarily
  the member that caused EAGAIN, is removed from the small_win list.

- We remove the group member 'usr_pending' flag. The size of the send
  window and presence in the 'small_win' list is sufficient criteria
  for recognizing congestion.

This solution seems to be a reasonable compromise between 'anycast',
which is normally not waiting for POLLOUT for a specific destination,
and the other three send modes, which are.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: improve groupcast scope handling
Jon Maloy [Mon, 8 Jan 2018 20:03:30 +0000 (21:03 +0100)]
tipc: improve groupcast scope handling

When a member joins a group, it also indicates a binding scope. This
makes it possible to create both node local groups, invisible to other
nodes, as well as cluster global groups, visible everywhere.

In order to avoid that different members end up having permanently
differing views of group size and memberhip, we must inhibit locally
and globally bound members from joining the same group.

We do this by using the binding scope as an additional separator between
groups. I.e., a member must ignore all membership events from sockets
using a different scope than itself, and all lookups for message
destinations must require an exact match between the message's lookup
scope and the potential target's binding scope.

Apart from making it possible to create local groups using the same
identity on different nodes, a side effect of this is that it now also
becomes possible to create a cluster global group with the same identity
across the same nodes, without interfering with the local groups.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: add option to suppress PUBLISH events for pre-existing publications
Jon Maloy [Mon, 8 Jan 2018 20:03:29 +0000 (21:03 +0100)]
tipc: add option to suppress PUBLISH events for pre-existing publications

Currently, when a user is subscribing for binding table publications,
he will receive a PUBLISH event for all already existing matching items
in the binding table.

However, a group socket making a subscriptions doesn't need this initial
status update from the binding table, because it has already scanned it
during the join operation. Worse, the multiplicatory effect of issuing
mutual events for dozens or hundreds group members within a short time
frame put a heavy load on the topology server, with the end result that
scale out operations on a big group tend to take much longer than needed.

We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
subscriptions, so that this initial avalanche of events is suppressed.
This change, along with the previous commit, significantly improves the
range and speed of group scale out operations.

We keep the new option internal for the tipc driver, at least for now.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: send out join messages as soon as new member is discovered
Jon Maloy [Mon, 8 Jan 2018 20:03:28 +0000 (21:03 +0100)]
tipc: send out join messages as soon as new member is discovered

When a socket is joining a group, we look up in the binding table to
find if there are already other members of the group present. This is
used for being able to return EAGAIN instead of EHOSTUNREACH if the
user proceeds directly to a send attempt.

However, the information in the binding table can be used to directly
set the created member in state MBR_PUBLISHED and send a JOIN message
to the peer, instead of waiting for a topology PUBLISH event to do this.
When there are many members in a group, the propagation time for such
events can be significant, and we can save time during the join
operation if we use the initial lookup result fully.

In this commit, we eliminate the member state MBR_DISCOVERED which has
been the result of the initial lookup, and do instead go directly to
MBR_PUBLISHED, which initiates the setup.

After this change, the tipc_member FSM looks as follows:

     +-----------+
---->| PUBLISHED |-----------------------------------------------+
PUB- +-----------+                                 LEAVE/WITHRAW |
LISH       |JOIN                                                 |
           |     +-------------------------------------------+   |
           |     |                            LEAVE/WITHDRAW |   |
           |     |                +------------+             |   |
           |     |   +----------->|  PENDING   |---------+   |   |
           |     |   |msg/maxactv +-+---+------+  LEAVE/ |   |   |
           |     |   |              |   |       WITHDRAW |   |   |
           |     |   |   +----------+   |                |   |   |
           |     |   |   |revert/maxactv|                |   |   |
           |     |   |   V              V                V   V   V
           |   +----------+  msg  +------------+       +-----------+
           +-->|  JOINED  |------>|   ACTIVE   |------>|  LEAVING  |--->
           |   +----------+       +--- -+------+ LEAVE/+-----------+DOWN
           |        A   A               |      WITHDRAW A   A    A   EVT
           |        |   |               |RECLAIM        |   |    |
           |        |   |REMIT          V               |   |    |
           |        |   |== adv   +------------+        |   |    |
           |        |   +---------| RECLAIMING |--------+   |    |
           |        |             +-----+------+  LEAVE/    |    |
           |        |                   |REMIT   WITHDRAW   |    |
           |        |                   |< adv              |    |
           |        |msg/               V            LEAVE/ |    |
           |        |adv==ADV_IDLE+------------+   WITHDRAW |    |
           |        +-------------|  REMITTED  |------------+    |
           |                      +------------+                 |
           |PUBLISH                                              |
JOIN +-----------+                                LEAVE/WITHDRAW |
---->|  JOINING  |-----------------------------------------------+
     +-----------+

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: simplify group LEAVE sequence
Jon Maloy [Mon, 8 Jan 2018 20:03:27 +0000 (21:03 +0100)]
tipc: simplify group LEAVE sequence

After the changes in the previous commit the group LEAVE sequence
can be simplified.

We now let the arrival of a LEAVE message unconditionally issue a group
DOWN event to the user. When a topology WITHDRAW event is received, the
member, if it still there, is set to state LEAVING, but we only issue a
group DOWN event when the link to the peer node is gone, so that no
LEAVE message is to be expected.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: create group member event messages when they are needed
Jon Maloy [Mon, 8 Jan 2018 20:03:26 +0000 (21:03 +0100)]
tipc: create group member event messages when they are needed

In the current implementation, a group socket receiving topology
events about other members just converts the topology event message
into a group event message and stores it until it reaches the right
state to issue it to the user. This complicates the code unnecessarily,
and becomes impractical when we in the coming commits will need to
create and issue membership events independently.

In this commit, we change this so that we just notice the type and
origin of the incoming topology event, and then drop the buffer. Only
when it is time to actually send a group event to the user do we
explicitly create a new message and send it upwards.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: adjustment to group member FSM
Jon Maloy [Mon, 8 Jan 2018 20:03:25 +0000 (21:03 +0100)]
tipc: adjustment to group member FSM

Analysis reveals that the member state MBR_QURANTINED in reality is
unnecessary, and can be replaced by the state MBR_JOINING at all
occurrencs.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: let group member stay in JOINED mode if unable to reclaim
Jon Maloy [Mon, 8 Jan 2018 20:03:24 +0000 (21:03 +0100)]
tipc: let group member stay in JOINED mode if unable to reclaim

We handle a corner case in the function tipc_group_update_rcv_win().
During extreme pessure it might happen that a message receiver has all
its active senders in RECLAIMING or REMITTED mode, meaning that there
is nobody to reclaim advertisements from if an additional sender tries
to go active.

Currently we just set the new sender to ACTIVE anyway, hence at least
theoretically opening up for a receiver queue overflow by exceeding the
MAX_ACTIVE limit. The correct solution to this is to instead add the
member to the pending queue, while letting the oldest member in that
queue revert to JOINED state.

In this commit we refactor the code for handling message arrival from
a JOINED member, both to make it more comprehensible and to cover the
case described above.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: a couple of cleanups
Jon Maloy [Mon, 8 Jan 2018 20:03:23 +0000 (21:03 +0100)]
tipc: a couple of cleanups

- We remove the 'reclaiming' member list in struct tipc_group, since
  it doesn't serve any purpose.

- We simplify the GRP_REMIT_MSG branch of tipc_group_protocol_rcv().

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>