Linus Torvalds [Thu, 7 Oct 2021 16:50:31 +0000 (09:50 -0700)]
Merge tag 'net-5.15-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from xfrm, bpf, netfilter, and wireless.
Current release - regressions:
- xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new
value in the middle of an enum
- unix: fix an issue in unix_shutdown causing the other end
read/write failures
- phy: mdio: fix memory leak
Current release - new code bugs:
- mlx5e: improve MQPRIO resiliency against bad configs
Previous releases - regressions:
- bpf: fix integer overflow leading to OOB access in map element
pre-allocation
- stmmac: dwmac-rk: fix ethernet on rk3399 based devices
- netfilter: conntrack: fix boot failure with
nf_conntrack.enable_hooks=1
- brcmfmac: revert using ISO3166 country code and 0 rev as fallback
- i40e: fix freeing of uninitialized misc IRQ vector
- iavf: fix double unlock of crit_lock
Previous releases - always broken:
- bpf, arm: fix register clobbering in div/mod implementation
- netfilter: nf_tables: correct issues in netlink rule change event
notifications
- dsa: tag_dsa: fix mask for trunked packets
- usb: r8152: don't resubmit rx immediately to avoid soft lockup on
device unplug
- i40e: fix endless loop under rtnl if FW fails to correctly respond
to capability query
- mlx5e: fix rx checksum offload coexistence with ipsec offload
- mlx5: force round second at 1PPS out start time and allow it only
in supported clock modes
- phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
sequence
Misc:
- xfrm: slightly rejig the new policy uAPI to make it less cryptic"
* tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
net: prefer socket bound to interface when not in VRF
iavf: fix double unlock of crit_lock
i40e: Fix freeing of uninitialized misc IRQ vector
i40e: fix endless loop under rtnl
dt-bindings: net: dsa: marvell: fix compatible in example
ionic: move filter sync_needed bit set
gve: report 64bit tx_bytes counter from gve_handle_report_stats()
gve: fix gve_get_stats()
rtnetlink: fix if_nlmsg_stats_size() under estimation
gve: Properly handle errors in gve_assign_qpl
gve: Avoid freeing NULL pointer
gve: Correct available tx qpl check
unix: Fix an issue in unix_shutdown causing the other end read/write failures
net: stmmac: trigger PCS EEE to turn off on link down
net: pcs: xpcs: fix incorrect steps on disable EEE
netlink: annotate data races around nlk->bound
net: pcs: xpcs: fix incorrect CL37 AN sequence
net: sfp: Fix typo in state machine debug string
net/sched: sch_taprio: properly cancel timer from taprio_destroy()
net: bridge: fix under estimation in br_get_linkxstats_size()
...
Linus Torvalds [Thu, 7 Oct 2021 16:44:48 +0000 (09:44 -0700)]
Merge tag 'hyperv-fixes-signed-
20211007' of git://git./linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Replace uuid.h with types.h in a header (Andy Shevchenko)
- Avoid sleeping in atomic context in PCI driver (Long Li)
- Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov)
* tag 'hyperv-fixes-signed-
20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Avoid erroneously sending IPI to 'self'
hyper-v: Replace uuid.h with types.h
PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus
Mike Manning [Tue, 5 Oct 2021 13:03:42 +0000 (14:03 +0100)]
net: prefer socket bound to interface when not in VRF
The commit
6da5b0f027a8 ("net: ensure unbound datagram socket to be
chosen when not in a VRF") modified compute_score() so that a device
match is always made, not just in the case of an l3mdev skb, then
increments the score also for unbound sockets. This ensures that
sockets bound to an l3mdev are never selected when not in a VRF.
But as unbound and bound sockets are now scored equally, this results
in the last opened socket being selected if there are matches in the
default VRF for an unbound socket and a socket bound to a dev that is
not an l3mdev. However, handling prior to this commit was to always
select the bound socket in this case. Reinstate this handling by
incrementing the score only for bound sockets. The required isolation
due to choosing between an unbound socket and a socket bound to an
l3mdev remains in place due to the device match always being made.
The same approach is taken for compute_score() for stream sockets.
Fixes: 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
Fixes: e78190581aff ("net: ensure unbound stream socket to be chosen when not in a VRF")
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/cf0a8523-b362-1edf-ee78-eef63cbbb428@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 7 Oct 2021 14:11:32 +0000 (07:11 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2021-10-07
We've added 7 non-merge commits during the last 8 day(s) which contain
a total of 8 files changed, 38 insertions(+), 21 deletions(-).
The main changes are:
1) Fix ARM BPF JIT to preserve caller-saved regs for DIV/MOD JIT-internal
helper call, from Johan Almbladh.
2) Fix integer overflow in BPF stack map element size calculation when
used with preallocation, from Tatsuhiko Yasumatsu.
3) Fix an AF_UNIX regression due to added BPF sockmap support related
to shutdown handling, from Jiang Wang.
4) Fix a segfault in libbpf when generating light skeletons from objects
without BTF, from Kumar Kartikeya Dwivedi.
5) Fix a libbpf memory leak in strset to free the actual struct strset
itself, from Andrii Nakryiko.
6) Dual-license bpf_insn.h similarly as we did for libbpf and bpftool,
with ACKs from all contributors, from Luca Boccassi.
====================
Link: https://lore.kernel.org/r/20211007135010.21143-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Thu, 7 Oct 2021 11:44:41 +0000 (12:44 +0100)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/
ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2021-10-07
1) Fix a sysbot reported shift-out-of-bounds in xfrm_get_default.
From Pavel Skripkin.
2) Fix XFRM_MSG_MAPPING ABI breakage. The new XFRM_MSG_MAPPING
messages were accidentally not paced at the end.
Fix by Eugene Syromiatnikov.
3) Fix the uapi for the default policy, use explicit field and macros
and make it accessible to userland.
From Nicolas Dichtel.
4) Fix a missing rcu lock in xfrm_notify_userpolicy().
From Nicolas Dichtel.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 7 Oct 2021 11:38:15 +0000 (12:38 +0100)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-
queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-10-06
This series contains updates to i40e and iavf drivers.
Jiri Benc expands an error check to prevent infinite loop for i40e.
Sylwester prevents freeing of uninitialized IRQ vector to resolve a
kernel oops for i40e.
Stefan Assmann fixes a double mutex unlock for iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 7 Oct 2021 01:26:36 +0000 (18:26 -0700)]
Merge tag 'devicetree-fixes-for-5.15-3' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Add another allowed address for TI sn65dsi86
- Drop more redundant minItems/maxItems
- Fix more graph 'unevaluatedProperties' warnings in media bindings
* tag 'devicetree-fixes-for-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: drm/bridge: ti-sn65dsi86: Fix reg value
dt-bindings: Drop more redundant 'maxItems/minItems'
dt-bindings: media: Fix more graph 'unevaluatedProperties' related warnings
Stefan Assmann [Tue, 24 Aug 2021 10:06:39 +0000 (12:06 +0200)]
iavf: fix double unlock of crit_lock
The crit_lock mutex could be unlocked twice as reported here
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-
20210823/025525.html
Remove the superfluous unlock. Technically the problem was already
present before
5ac49f3c2702 as that commit only replaced the locking
primitive, but no functional change.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Fixes: bac8486116b0 ("iavf: Refactor the watchdog state machine")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Sylwester Dziedziuch [Fri, 24 Sep 2021 09:40:41 +0000 (11:40 +0200)]
i40e: Fix freeing of uninitialized misc IRQ vector
When VSI set up failed in i40e_probe() as part of PF switch set up
driver was trying to free misc IRQ vectors in
i40e_clear_interrupt_scheme and produced a kernel Oops:
Trying to free already-free IRQ 266
WARNING: CPU: 0 PID: 5 at kernel/irq/manage.c:1731 __free_irq+0x9a/0x300
Workqueue: events work_for_cpu_fn
RIP: 0010:__free_irq+0x9a/0x300
Call Trace:
? synchronize_irq+0x3a/0xa0
free_irq+0x2e/0x60
i40e_clear_interrupt_scheme+0x53/0x190 [i40e]
i40e_probe.part.108+0x134b/0x1a40 [i40e]
? kmem_cache_alloc+0x158/0x1c0
? acpi_ut_update_ref_count.part.1+0x8e/0x345
? acpi_ut_update_object_reference+0x15e/0x1e2
? strstr+0x21/0x70
? irq_get_irq_data+0xa/0x20
? mp_check_pin_attr+0x13/0xc0
? irq_get_irq_data+0xa/0x20
? mp_map_pin_to_irq+0xd3/0x2f0
? acpi_register_gsi_ioapic+0x93/0x170
? pci_conf1_read+0xa4/0x100
? pci_bus_read_config_word+0x49/0x70
? do_pci_enable_device+0xcc/0x100
local_pci_probe+0x41/0x90
work_for_cpu_fn+0x16/0x20
process_one_work+0x1a7/0x360
worker_thread+0x1cf/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x1f/0x40
The problem is that at that point misc IRQ vectors
were not allocated yet and we get a call trace
that driver is trying to free already free IRQ vectors.
Add a check in i40e_clear_interrupt_scheme for __I40E_MISC_IRQ_REQUESTED
PF state before calling i40e_free_misc_vector. This state is set only if
misc IRQ vectors were properly initialized.
Fixes: c17401a1dd21 ("i40e: use separate state bit for miscellaneous IRQ setup")
Reported-by: PJ Waskiewicz <pwaskiewicz@jumptrading.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jiri Benc [Tue, 14 Sep 2021 08:54:42 +0000 (10:54 +0200)]
i40e: fix endless loop under rtnl
The loop in i40e_get_capabilities can never end. The problem is that
although i40e_aq_discover_capabilities returns with an error if there's
a firmware problem, the returned error is not checked. There is a check for
pf->hw.aq.asq_last_status but that value is set to I40E_AQ_RC_OK on most
firmware problems.
When i40e_aq_discover_capabilities encounters a firmware problem, it will
encounter the same problem on its next invocation. As the result, the loop
becomes endless. We hit this with I40E_ERR_ADMIN_QUEUE_TIMEOUT but looking
at the code, it can happen with a range of other firmware errors.
I don't know what the correct behavior should be: whether the firmware
should be retried a few times, or whether pf->hw.aq.asq_last_status should
be always set to the encountered firmware error (but then it would be
pointless and can be just replaced by the i40e_aq_discover_capabilities
return value). However, the current behavior with an endless loop under the
rtnl mutex(!) is unacceptable and Intel has not submitted a fix, although we
explained the bug to them 7 months ago.
This may not be the best possible fix but it's better than hanging the whole
system on a firmware bug.
Fixes: 56a62fc86895 ("i40e: init code and hardware support")
Tested-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Vitaly Kuznetsov [Wed, 6 Oct 2021 12:50:16 +0000 (14:50 +0200)]
x86/hyperv: Avoid erroneously sending IPI to 'self'
__send_ipi_mask_ex() uses an optimization: when the target CPU mask is
equal to 'cpu_present_mask' it uses 'HV_GENERIC_SET_ALL' format to avoid
converting the specified cpumask to VP_SET. This case was overlooked when
'exclude_self' parameter was added. As the result, a spurious IPI to
'self' can be send.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: dfb5c1e12c28 ("x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211006125016.941616-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Marcel Ziswiler [Wed, 6 Oct 2021 06:31:04 +0000 (08:31 +0200)]
dt-bindings: net: dsa: marvell: fix compatible in example
While the MV88E6390 switch chip exists, one is supposed to use a
compatible of "marvell,mv88e6190" for it. Fix this in the given example.
Signed-off-by: Marcel Ziswiler <marcel@ziswiler.com>
Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 5 Oct 2021 23:11:05 +0000 (16:11 -0700)]
ionic: move filter sync_needed bit set
Move the setting of the filter-sync-needed bit to the error
case in the filter add routine to be sure we're checking the
live filter status rather than a copy of the pre-sync status.
Fixes: 969f84394604 ("ionic: sync the filters in the work task")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 6 Oct 2021 01:01:38 +0000 (18:01 -0700)]
gve: report 64bit tx_bytes counter from gve_handle_report_stats()
Each tx queue maintains a 64bit counter for bytes, there is
no reason to truncate this to 32bit (or this has not been
documented)
Fixes: 24aeb56f2d38 ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yangchun Fu <yangchun@google.com>
Cc: Kuo Zhao <kuozhao@google.com>
Cc: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 6 Oct 2021 00:30:30 +0000 (17:30 -0700)]
gve: fix gve_get_stats()
gve_get_stats() can report wrong numbers if/when u64_stats_fetch_retry()
returns true.
What is needed here is to sample values in temporary variables,
and only use them after each loop is ended.
Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Catherine Sullivan <csully@google.com>
Cc: Sagi Shahar <sagis@google.com>
Cc: Jon Olson <jonolson@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Tao Liu <xliutaox@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 5 Oct 2021 21:04:17 +0000 (14:04 -0700)]
rtnetlink: fix if_nlmsg_stats_size() under estimation
rtnl_fill_statsinfo() is filling skb with one mandatory if_stats_msg structure.
nlmsg_put(skb, pid, seq, type, sizeof(struct if_stats_msg), flags);
But if_nlmsg_stats_size() never considered the needed storage.
This bug did not show up because alloc_skb(X) allocates skb with
extra tailroom, because of added alignments. This could very well
be changed in the future to have deterministic behavior.
Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Wed, 6 Oct 2021 02:42:21 +0000 (19:42 -0700)]
gve: Properly handle errors in gve_assign_qpl
Ignored errors would result in crash.
Fixes: ede3fcf5ec67f ("gve: Add support for raw addressing to the rx path")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tao Liu [Wed, 6 Oct 2021 02:42:20 +0000 (19:42 -0700)]
gve: Avoid freeing NULL pointer
Prevent possible crashes when cleaning up after unsuccessful
initializations.
Fixes: 893ce44df5658 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Catherine Sully <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Wed, 6 Oct 2021 02:42:19 +0000 (19:42 -0700)]
gve: Correct available tx qpl check
The qpl_map_size is rounded up to a multiple of sizeof(long), but the
number of qpls doesn't have to be.
Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiang Wang [Mon, 4 Oct 2021 23:25:28 +0000 (23:25 +0000)]
unix: Fix an issue in unix_shutdown causing the other end read/write failures
Commit
94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") sets
unix domain socket peer state to TCP_CLOSE in unix_shutdown. This could
happen when the local end is shutdown but the other end is not. Then,
the other end will get read or write failures which is not expected.
Fix the issue by setting the local state to shutdown.
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211004232530.2377085-1-jiang.wang@bytedance.com
Andy Shevchenko [Fri, 1 Oct 2021 13:55:44 +0000 (16:55 +0300)]
hyper-v: Replace uuid.h with types.h
There is no user of anything in uuid.h in the hyperv.h. Replace it with
more appropriate types.h.
Fixes: f081bbb3fd03 ("hyper-v: Remove internal types from UAPI header")
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/20211001135544.1823-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
David S. Miller [Wed, 6 Oct 2021 10:18:27 +0000 (11:18 +0100)]
Merge branch 'stmmac-eee-fix'
Wong Vee Khee says:
====================
net: stmmac: Turn off EEE on MAC link down
This patch series ensure PCS EEE is turned off on the event of MAC
link down.
Tested on Intel AlderLake-S (STMMAC + MaxLinear GPY211 PHY).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wong Vee Khee [Tue, 5 Oct 2021 11:51:00 +0000 (19:51 +0800)]
net: stmmac: trigger PCS EEE to turn off on link down
The current implementation enable PCS EEE feature in the event of link
up, but PCS EEE feature is not disabled on link down.
This patch makes sure PCE EEE feature is disabled on link down.
Fixes: 656ed8b015f1 ("net: stmmac: fix EEE init issue when paired with EEE capable PHYs")
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wong Vee Khee [Tue, 5 Oct 2021 11:50:59 +0000 (19:50 +0800)]
net: pcs: xpcs: fix incorrect steps on disable EEE
When Energy-Efficient Ethernet(EEE) is disable from the MAC side,
we need to clear the DW_VR_MII_EEE_TRN_LPI bit of DW_VR_MII_EEE_MCTRL1
register.
Fixes: 7617af3d1a5e ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet")
Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 5 Oct 2021 17:52:53 +0000 (10:52 -0700)]
Merge tag 'warning-fixes-
20211005' of git://git./linux/kernel/git/dhowells/linux-fs
Pull misc fs warning fixes from David Howells:
"The first four patches fix kerneldoc warnings in fscache, afs, 9p and
nfs - they're mostly just comment changes, though there's one place in
9p where a comment got detached from the function it was attached to
(v9fs_fid_add) and has to switch places with a function that got
inserted between (__add_fid).
The patch on the end removes an unused symbol in fscache"
* tag 'warning-fixes-
20211005' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: Remove an unused static variable
fscache: Fix some kerneldoc warnings shown up by W=1
9p: Fix a bunch of kerneldoc warnings shown up by W=1
afs: Fix kerneldoc warning shown up by W=1
nfs: Fix kerneldoc warning shown up by W=1
Eric Dumazet [Mon, 4 Oct 2021 21:24:15 +0000 (14:24 -0700)]
netlink: annotate data races around nlk->bound
While existing code is correct, KCSAN is reporting
a data-race in netlink_insert / netlink_sendmsg [1]
It is correct to read nlk->bound without a lock, as netlink_autobind()
will acquire all needed locks.
[1]
BUG: KCSAN: data-race in netlink_insert / netlink_sendmsg
write to 0xffff8881031c8b30 of 1 bytes by task 18752 on cpu 0:
netlink_insert+0x5cc/0x7f0 net/netlink/af_netlink.c:597
netlink_autobind+0xa9/0x150 net/netlink/af_netlink.c:842
netlink_sendmsg+0x479/0x7c0 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:703 [inline]
sock_sendmsg net/socket.c:723 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
___sys_sendmsg net/socket.c:2446 [inline]
__sys_sendmsg+0x1ed/0x270 net/socket.c:2475
__do_sys_sendmsg net/socket.c:2484 [inline]
__se_sys_sendmsg net/socket.c:2482 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881031c8b30 of 1 bytes by task 18751 on cpu 1:
netlink_sendmsg+0x270/0x7c0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:703 [inline]
sock_sendmsg net/socket.c:723 [inline]
__sys_sendto+0x2a8/0x370 net/socket.c:2019
__do_sys_sendto net/socket.c:2031 [inline]
__se_sys_sendto net/socket.c:2027 [inline]
__x64_sys_sendto+0x74/0x90 net/socket.c:2027
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 18751 Comm: syz-executor.0 Not tainted 5.14.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: da314c9923fe ("netlink: Replace rhash_portid with bound")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wong Vee Khee [Tue, 5 Oct 2021 03:45:21 +0000 (11:45 +0800)]
net: pcs: xpcs: fix incorrect CL37 AN sequence
According to Synopsys DesignWare Cores Ethernet PCS databook, it is
required to disable Clause 37 auto-negotiation by programming bit-12
(AN_ENABLE) to 0 if it is already enabled, before programming various
fields of VR_MII_AN_CTRL registers.
After all these programming are done, it is then required to enable
Clause 37 auto-negotiation by programming bit-12 (AN_ENABLE) to 1.
Fixes: b97b5331b8ab ("net: pcs: add C37 SGMII AN support for intel mGbE controller")
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 4 Oct 2021 21:50:02 +0000 (17:50 -0400)]
net: sfp: Fix typo in state machine debug string
The string should be "tx_disable" to match the state enum.
Fixes: 4005a7cb4f55 ("net: phy: sftp: print debug message with text, not numbers")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 4 Oct 2021 19:55:22 +0000 (12:55 -0700)]
net/sched: sch_taprio: properly cancel timer from taprio_destroy()
There is a comment in qdisc_create() about us not calling ops->reset()
in some cases.
err_out4:
/*
* Any broken qdiscs that would require a ops->reset() here?
* The qdisc was never in action so it shouldn't be necessary.
*/
As taprio sets a timer before actually receiving a packet, we need
to cancel it from ops->destroy, just in case ops->reset has not
been called.
syzbot reported:
ODEBUG: free active (active state 0) object type: hrtimer hint: advance_sched+0x0/0x9a0 arch/x86/include/asm/atomic64_64.h:22
WARNING: CPU: 0 PID: 8441 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 8441 Comm: syz-executor813 Not tainted 5.14.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd e0 d3 e3 89 4c 89 ee 48 c7 c7 e0 c7 e3 89 e8 5b 86 11 05 <0f> 0b 83 05 85 03 92 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:
ffffc9000130f330 EFLAGS:
00010282
RAX:
0000000000000000 RBX:
0000000000000003 RCX:
0000000000000000
RDX:
ffff88802baeb880 RSI:
ffffffff815d87b5 RDI:
fffff52000261e58
RBP:
0000000000000001 R08:
0000000000000000 R09:
0000000000000000
R10:
ffffffff815d25ee R11:
0000000000000000 R12:
ffffffff898dd020
R13:
ffffffff89e3ce20 R14:
ffffffff81653630 R15:
dffffc0000000000
FS:
0000000000f0d300(0000) GS:
ffff8880b9d00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007ffb64b3e000 CR3:
0000000036557000 CR4:
00000000001506e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
__debug_check_no_obj_freed lib/debugobjects.c:987 [inline]
debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1018
slab_free_hook mm/slub.c:1603 [inline]
slab_free_freelist_hook+0x171/0x240 mm/slub.c:1653
slab_free mm/slub.c:3213 [inline]
kfree+0xe4/0x540 mm/slub.c:4267
qdisc_create+0xbcf/0x1320 net/sched/sch_api.c:1299
tc_modify_qdisc+0x4c8/0x1a60 net/sched/sch_api.c:1663
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2403
___sys_sendmsg+0xf3/0x170 net/socket.c:2457
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
Fixes: 44d4775ca518 ("net/sched: sch_taprio: reset child qdiscs before freeing them")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Oct 2021 11:39:19 +0000 (12:39 +0100)]
Merge branch 'bridge-fixes'
Eric Dumazet says:
====================
net: bridge: br_get_linkxstats_size() fixes
This patch series attempts to fix the following syzbot report.
WARNING: CPU: 1 PID: 21425 at net/core/rtnetlink.c:5388 rtnl_stats_get+0x80f/0x8c0 net/core/rtnetlink.c:5388
Modules linked in:
CPU: 1 PID: 21425 Comm: syz-executor394 Not tainted 5.13.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rtnl_stats_get+0x80f/0x8c0 net/core/rtnetlink.c:5388
Code: e9 9c fc ff ff 4c 89 e7 89 0c 24 e8 ab 8b a8 fa 8b 0c 24 e9 bc fc ff ff 4c 89 e7 e8 9b 8b a8 fa e9 df fe ff ff e8 61 85 63 fa <0f> 0b e9 f7 fc ff ff 41 be ea ff ff ff e9 f9 fc ff ff 41 be 97 ff
RSP: 0018:
ffffc9000cf77688 EFLAGS:
00010293
RAX:
0000000000000000 RBX:
000000000000012c RCX:
0000000000000000
RDX:
ffff8880211754c0 RSI:
ffffffff8711571f RDI:
0000000000000003
RBP:
ffff8880175aa780 R08:
00000000ffffffa6 R09:
ffff88823bd5c04f
R10:
ffffffff87115413 R11:
0000000000000001 R12:
ffff8880175aab74
R13:
ffff8880175aab40 R14:
00000000ffffffa6 R15:
0000000000000006
FS:
0000000001ff9300(0000) GS:
ffff8880b9c00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000005cfd58 CR3:
000000002cd43000 CR4:
00000000001506f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5562
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:674
____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
___sys_sendmsg+0xf3/0x170 net/socket.c:2404
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4440d9
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 5 Oct 2021 01:05:08 +0000 (18:05 -0700)]
net: bridge: fix under estimation in br_get_linkxstats_size()
Commit
de1799667b00 ("net: bridge: add STP xstats")
added an additional nla_reserve_64bit() in br_fill_linkxstats(),
but forgot to update br_get_linkxstats_size() accordingly.
This can trigger the following in rtnl_stats_get()
WARN_ON(err == -EMSGSIZE);
Fixes: de1799667b00 ("net: bridge: add STP xstats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 5 Oct 2021 01:05:07 +0000 (18:05 -0700)]
net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size()
bridge_fill_linkxstats() is using nla_reserve_64bit().
We must use nla_total_size_64bit() instead of nla_total_size()
for corresponding data structure.
Fixes: 1080ab95e3c7 ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Mon, 4 Oct 2021 06:28:58 +0000 (14:28 +0800)]
r8152: avoid to resubmit rx immediately
For the situation that the disconnect event comes very late when the
device is unplugged, the driver would resubmit the RX bulk transfer
after getting the callback with -EPROTO immediately and continually.
Finally, soft lockup occurs.
This patch avoids to resubmit RX immediately. It uses a workqueue to
schedule the RX NAPI. And the NAPI would resubmit the RX. It let the
disconnect event have opportunity to stop the submission before soft
lockup.
Reported-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Tested-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 4 Oct 2021 23:01:40 +0000 (16:01 -0700)]
etherdevice: use __dev_addr_set()
Andrew points out that eth_hw_addr_set() replaces memcpy()
calls so we can't use ether_addr_copy() which assumes
both arguments are 2-bytes aligned.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 4 Oct 2021 21:33:30 +0000 (14:33 -0700)]
Merge tag 'linux-kselftest-fixes-5.15-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"A fix to implicit declaration warns in drivers/dma-buf test"
* tag 'linux-kselftest-fixes-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: drivers/dma-buf: Fix implicit declaration warns
David Howells [Mon, 20 Sep 2021 09:33:55 +0000 (10:33 +0100)]
fscache: Remove an unused static variable
The fscache object CREATE_OBJECT work state isn't ever referred to, so
remove it and avoid the unused variable warning caused by W=1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/163214005516.2945267.7000234432243167892.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163281899704.2790286.9177774252843775348.stgit@warthog.procyon.org.uk/
David Howells [Mon, 4 Oct 2021 21:08:50 +0000 (22:08 +0100)]
fscache: Fix some kerneldoc warnings shown up by W=1
Fix some kerneldoc warnings in the fscache driver that are shown up by W=1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Mauro Carvalho Chehab <mchehab@kernel.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/163214005516.2945267.7000234432243167892.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163281899704.2790286.9177774252843775348.stgit@warthog.procyon.org.uk/
David Howells [Mon, 4 Oct 2021 21:07:22 +0000 (22:07 +0100)]
9p: Fix a bunch of kerneldoc warnings shown up by W=1
Fix a bunch of kerneldoc warnings shown up by W=1 in the 9p filesystem:
(1) Add/remove/fix kerneldoc parameters descriptions.
(2) Move __add_fid() from between v9fs_fid_add() and its comment.
(3) 9p's caches_show() doesn't really make sense as an API function, so
remove the kerneldoc annotation. It's also not prefixed with 'v9fs_'.
Also remove the kerneldoc markers from the 9p fscache wrappers.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Mauro Carvalho Chehab <mchehab@kernel.org>
cc: v9fs-developer@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/163214005516.2945267.7000234432243167892.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163281899704.2790286.9177774252843775348.stgit@warthog.procyon.org.uk/
David Howells [Mon, 4 Oct 2021 21:04:33 +0000 (22:04 +0100)]
afs: Fix kerneldoc warning shown up by W=1
Fix a kerneldoc warning in afs due to a partially documented internal
function by removing the kerneldoc marker.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/163214005516.2945267.7000234432243167892.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163281899704.2790286.9177774252843775348.stgit@warthog.procyon.org.uk/
David Howells [Mon, 4 Oct 2021 21:02:01 +0000 (22:02 +0100)]
nfs: Fix kerneldoc warning shown up by W=1
Fix a kerneldoc warning in nfs due to documentation for a parameter that
isn't present.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna.schumaker@netapp.com>
cc: Mauro Carvalho Chehab <mchehab@kernel.org>
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/163214005516.2945267.7000234432243167892.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163281899704.2790286.9177774252843775348.stgit@warthog.procyon.org.uk/
Geert Uytterhoeven [Fri, 24 Sep 2021 12:35:12 +0000 (14:35 +0200)]
dt-bindings: drm/bridge: ti-sn65dsi86: Fix reg value
make dtbs_check:
arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dt.yaml: bridge@2c: reg:0:0: 45 was expected
According to the datasheet, the I2C address can be either 0x2c or 0x2d,
depending on the ADDR control input.
Fixes: e3896e6dddf0b821 ("dt-bindings: drm/bridge: Document sn65dsi86 bridge bindings")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Link: https://lore.kernel.org/r/08f73c2aa0d4e580303357dfae107d084d962835.1632486753.git.geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
Linus Torvalds [Mon, 4 Oct 2021 16:53:40 +0000 (09:53 -0700)]
Merge tag 'media/v5.15-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"There's just one patch here, fixing a -Werror issue at
staging/atomisp"
* tag 'media/v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: atomisp: restore missing 'return' statement
Linus Torvalds [Mon, 4 Oct 2021 16:46:36 +0000 (09:46 -0700)]
Merge tag 'ovl-fixes-5.15-rc5' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix two bugs, both of them corner cases not affecting most users"
* tag 'ovl-fixes-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix IOCB_DIRECT if underlying fs doesn't support direct IO
ovl: fix missing negative dentry check in ovl_rename()
Linus Torvalds [Mon, 4 Oct 2021 16:38:55 +0000 (09:38 -0700)]
Merge tag 'mips-fixes_5.15_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
"Revert workaround for buggy cpu detection because regressions"
* tag 'mips-fixes_5.15_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Revert "add support for buggy MT7621S core detection"
Arnd Bergmann [Mon, 2 Aug 2021 14:38:14 +0000 (16:38 +0200)]
media: atomisp: restore missing 'return' statement
The input_system_configure_channel_sensor() function lost its final
return code in a previous patch:
drivers/staging/media/atomisp/pci/hive_isp_css_common/host/input_system.c: In function 'input_system_configure_channel_sensor':
drivers/staging/media/atomisp/pci/hive_isp_css_common/host/input_system.c:1649:1: error: control reaches end of non-void function [-Werror=return-type]
Restore what was there originally.
Link: https://lore.kernel.org/linux-media/20210802143820.1150099-1-arnd@kernel.org
Fixes: 728a5c64ae5f ("media: atomisp: remove dublicate code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Andrew Lunn [Sun, 3 Oct 2021 15:50:53 +0000 (17:50 +0200)]
dsa: tag_dsa: Fix mask for trunked packets
A packet received on a trunk will have bit 2 set in Forward DSA tagged
frame. Bit 1 can be either 0 or 1 and is otherwise undefined and bit 0
indicates the frame CFI. Masking with 7 thus results in frames as
being identified as being from a trunk when in fact they are not. Fix
the mask to just look at bit 2.
Fixes: 5b60dadb71db ("net: dsa: tag_dsa: Support reception of packets from LAG devices")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 3 Oct 2021 21:08:47 +0000 (14:08 -0700)]
Linux 5.15-rc4
Chen Jingwen [Tue, 28 Sep 2021 12:56:57 +0000 (20:56 +0800)]
elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
In commit
b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.
Unfortunately, this will cause kernel to fail to start with:
1 (init): Uhuuh, elf segment at
00003ffff7ffd000 requested but the memory is mapped already
Failed to execute /init (error -17)
The reason is that the elf interpreter (ld.so) has overlapping segments.
readelf -l ld-2.31.so
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000002c94c 0x000000000002c94c R E 0x10000
LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
0x00000000000021e8 0x0000000000002320 RW 0x10000
LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
0x00000000000011ac 0x0000000000001328 RW 0x10000
The reason for this problem is the same as described in commit
ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments").
Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.
Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
Cc: <stable@vger.kernel.org> # v4.19
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Oct 2021 20:56:53 +0000 (13:56 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of ext4 bugs in fast_commit, inline data, and delayed
allocation.
Also fix error handling code paths in ext4_dx_readdir() and
ext4_fill_super().
Finally, avoid a grabbing a journal head in the delayed allocation
write in the common cases where we are overwriting a pre-existing
block or appending to an inode"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: recheck buffer uptodate bit under buffer lock
ext4: fix potential infinite loop in ext4_dx_readdir()
ext4: flush s_error_work before journal destroy in ext4_fill_super
ext4: fix loff_t overflow in ext4_max_bitmap_size()
ext4: fix reserved space counter leakage
ext4: limit the number of blocks in one ADD_RANGE TLV
ext4: enforce buffer head state assertion in ext4_da_map_blocks
ext4: remove extent cache entries when truncating inline data
ext4: drop unnecessary journal handle in delalloc write
ext4: factor out write end code of inline file
ext4: correct the error path of ext4_write_inline_data_end()
ext4: check and update i_disksize properly
ext4: add error checking to ext4_ext_replay_set_iblocks()
Linus Torvalds [Sun, 3 Oct 2021 20:45:48 +0000 (13:45 -0700)]
objtool: print out the symbol type when complaining about it
The objtool warning that the kvm instruction emulation code triggered
wasn't very useful:
arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception
in that it helpfully tells you which symbol name it had trouble figuring
out the relocation for, but it doesn't actually say what the unknown
symbol type was that triggered it all.
In this case it was because of missing type information (type 0, aka
STT_NOTYPE), but on the whole it really should just have printed that
out as part of the message.
Because if this warning triggers, that's very much the first thing you
want to know - why did reloc2sec_off() return failure for that symbol?
So rather than just saying you can't handle some type of symbol without
saying what the type _was_, just print out the type number too.
Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Oct 2021 20:34:19 +0000 (13:34 -0700)]
kvm: fix objtool relocation warning
The recent change to make objtool aware of more symbol relocation types
(commit
24ff65257375: "objtool: Teach get_alt_entry() about more
relocation types") also added another check, and resulted in this
objtool warning when building kvm on x86:
arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception
The reason seems to be that kvm_fastop_exception() is marked as a global
symbol, which causes the relocation to ke kept around for objtool. And
at the same time, the kvm_fastop_exception definition (which is done as
an inline asm statement) doesn't actually set the type of the global,
which then makes objtool unhappy.
The minimal fix is to just not mark kvm_fastop_exception as being a
global symbol. It's only used in that one compilation unit anyway, so
it was always pointless. That's how all the other local exception table
labels are done.
I'm not entirely happy about the kinds of games that the kvm code plays
with doing its own exception handling, and the fact that it confused
objtool is most definitely a symptom of the code being a bit too subtle
and ad-hoc. But at least this trivial one-liner makes objtool no longer
upset about what is going on.
Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Oct 2021 18:19:02 +0000 (11:19 -0700)]
Merge tag 'char-misc-5.15-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small misc driver fixes for 5.15-rc4. They are in two
"groups":
- ipack driver fixes for issues found by Johan Hovold
- interconnect driver fixes for reported problems
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
ipack: ipoctal: fix module reference leak
ipack: ipoctal: fix missing allocation-failure check
ipack: ipoctal: fix tty-registration error handling
ipack: ipoctal: fix tty registration race
ipack: ipoctal: fix stack information leak
interconnect: qcom: sdm660: Add missing a2noc qos clocks
dt-bindings: interconnect: sdm660: Add missing a2noc qos clocks
interconnect: qcom: sdm660: Correct NOC_QOS_PRIORITY shift and mask
interconnect: qcom: sdm660: Fix id of slv_cnoc_mnoc_cfg
Linus Torvalds [Sun, 3 Oct 2021 18:10:09 +0000 (11:10 -0700)]
Merge tag 'driver-core-5.15-rc4' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some driver core and kernfs fixes for reported issues for
5.15-rc4. These fixes include:
- kernfs positive dentry bugfix
- debugfs_create_file_size error path fix
- cpumask sysfs file bugfix to preserve the user/kernel abi (has been
reported multiple times.)
- devlink fixes for mdiobus devices as reported by the subsystem
maintainers.
Also included in here are some devlink debugging changes to make it
easier for people to report problems when asked. They have already
helped with the mdiobus and other subsystems reporting issues.
All of these have been linux-next for a while with no reported issues"
* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: also call kernfs_set_rev() for positive dentry
driver core: Add debug logs when fwnode links are added/deleted
driver core: Create __fwnode_link_del() helper function
driver core: Set deferred probe reason when deferred by driver core
net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
driver core: fw_devlink: Improve handling of cyclic dependencies
cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
debugfs: debugfs_create_file_size(): use IS_ERR to check for error
Linus Torvalds [Sun, 3 Oct 2021 17:49:00 +0000 (10:49 -0700)]
Merge tag 'sched_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Tell the compiler to always inline is_percpu_thread()
- Make sure tunable_scaling buffer is null-terminated after an update
in sysfs
- Fix LTP named regression due to cgroup list ordering
* tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Always inline is_percpu_thread()
sched/fair: Null terminate buffer when updating tunable_scaling
sched/fair: Add ancestors of unthrottled undecayed cfs_rq
Linus Torvalds [Sun, 3 Oct 2021 17:32:27 +0000 (10:32 -0700)]
Merge tag 'perf_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Make sure the destroy callback is reset when a event initialization
fails
- Update the event constraints for Icelake
- Make sure the active time of an event is updated even for inactive
events
* tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: fix userpage->time_enabled of inactive events
perf/x86/intel: Update event constraints for ICX
perf/x86: Reset destroy callback on event init failure
Linus Torvalds [Sun, 3 Oct 2021 17:23:54 +0000 (10:23 -0700)]
Merge tag 'objtool_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Handle symbol relocations properly due to changes in the toolchains
which remove section symbols now
* tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Teach get_alt_entry() about more relocation types
Linus Torvalds [Sun, 3 Oct 2021 00:51:01 +0000 (17:51 -0700)]
Merge tag 'hwmon-for-v5.15-rc4' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fixed various potential NULL pointer accesses in w8379* drivers
- Improved error handling, fault reporting, and fixed rounding in
thmp421 driver
- Fixed error handling in ltc2947 driver
- Added missing attribute to pmbus/mp2975 driver
- Fixed attribute values in pbus/ibm-cffps, occ, and mlxreg-fan
drivers
- Removed unused residual code from k10temp driver
* tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller
hwmon: (pmbus/ibm-cffps) max_power_out swap changes
hwmon: (occ) Fix P10 VRM temp sensors
hwmon: (ltc2947) Properly handle errors when looking for the external clock
hwmon: (tmp421) fix rounding for negative values
hwmon: (tmp421) report /PVLD condition as fault
hwmon: (tmp421) handle I2C errors
hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs
hwmon: (k10temp) Remove residues of current and voltage
Linus Torvalds [Sun, 3 Oct 2021 00:43:54 +0000 (17:43 -0700)]
Merge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
"Eleven fixes for the ksmbd kernel server, mostly security related:
- an important fix for disabling weak NTLMv1 authentication
- seven security (improved buffer overflow checks) fixes
- fix for wrong infolevel struct used in some getattr/setattr paths
- two small documentation fixes"
* tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: missing check for NULL in convert_to_nt_pathname()
ksmbd: fix transform header validation
ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
ksmbd: add validation in smb2 negotiate
ksmbd: add request buffer validation in smb2_set_info
ksmbd: use correct basic info level in set_file_basic_info()
ksmbd: remove NTLMv1 authentication
ksmbd: fix documentation for 2 functions
MAINTAINERS: rename cifs_common to smbfs_common in cifs and ksmbd entry
ksmbd: fix invalid request buffer access in compound
ksmbd: remove RFC1002 check in smb2 request
Linus Torvalds [Sat, 2 Oct 2021 19:56:03 +0000 (12:56 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five fairly minor fixes and spelling updates, all in drivers. Even
though the ufs fix is in tracing, it's a potentially exploitable use
beyond end of array bug"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: csiostor: Add module softdep on cxgb4
scsi: qla2xxx: Fix excessive messages during device logout
scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported"
scsi: ses: Fix unsigned comparison with less than zero
scsi: ufs: Fix illegal offset in UPIU event trace
Linus Torvalds [Sat, 2 Oct 2021 18:00:36 +0000 (11:00 -0700)]
Merge tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few block fixes for this release:
- Revert a BFQ commit that causes breakage for people. Unfortunately
it was auto-selected for stable as well, so now 5.14.7 suffers from
it too. Hopefully stable will pick up this revert quickly too, so
we can remove the issue on that end as well.
- Add a quirk for Apple NVMe controllers, which due to their
non-compliance broke due to the introduction of command sequences
(Keith)
- Use shifts in nbd, fixing a __divdi3 issue (Nick)"
* tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
nbd: use shifts rather than multiplies
Revert "block, bfq: honor already-setup queue merges"
nvme: add command id quirk for apple controllers
Linus Torvalds [Sat, 2 Oct 2021 17:26:19 +0000 (10:26 -0700)]
Merge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two fixes in here:
- The signal issue that was discussed start of this week (me).
- Kill dead fasync support in io_uring. Looks like it was broken
since io_uring was initially merged, and given that nobody has ever
complained about it, let's just kill it (Pavel)"
* tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
io_uring: kill fasync
io-wq: exclusively gate signal based exit on get_signal() return
Linus Torvalds [Sat, 2 Oct 2021 17:08:35 +0000 (10:08 -0700)]
Merge tag 'libnvdimm-fixes-5.15-rc4' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A fix for a regression added this cycle in the pmem driver, and for a
long standing bug for failed NUMA node lookups on ARM64.
This has appeared in -next for several days with no reported issues.
Summary:
- Fix a regression that caused the sysfs ABI for pmem block devices
to not be registered. This fails the nvdimm unit tests and dax
xfstests.
- Fix numa node lookups for dax-kmem memory (device-dax memory
assigned to the page allocator) on ARM64"
* tag 'libnvdimm-fixes-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm/pmem: fix creating the dax group
ACPI: NFIT: Use fallback node id when numa info in NFIT table is incorrect
Dave Wysochanski [Fri, 1 Oct 2021 14:37:31 +0000 (15:37 +0100)]
cachefiles: Fix oops in trace_cachefiles_mark_buried due to NULL object
In cachefiles_mark_object_buried, the dentry in question may not have an
owner, and thus our cachefiles_object pointer may be NULL when calling
the tracepoint, in which case we will also not have a valid debug_id to
print in the tracepoint.
Check for NULL object in the tracepoint and if so, just set debug_id to
MAX_UINT as was done in
2908f5e101e3 ("fscache: Add a cookie debug ID
and use that in traces").
This fixes the following oops:
FS-Cache: Cache "mycache" added (type cachefiles)
CacheFiles: File cache on vdc registered
...
Workqueue: fscache_object fscache_object_work_func [fscache]
RIP: 0010:trace_event_raw_event_cachefiles_mark_buried+0x4e/0xa0 [cachefiles]
....
Call Trace:
cachefiles_mark_object_buried+0xa5/0xb0 [cachefiles]
cachefiles_bury_object+0x270/0x430 [cachefiles]
cachefiles_walk_to_object+0x195/0x9c0 [cachefiles]
cachefiles_lookup_object+0x5a/0xc0 [cachefiles]
fscache_look_up_object+0xd7/0x160 [fscache]
fscache_object_work_func+0xb2/0x340 [fscache]
process_one_work+0x1f1/0x390
worker_thread+0x53/0x3e0
kthread+0x127/0x150
Fixes: 2908f5e101e3 ("fscache: Add a cookie debug ID and use that in traces")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 2 Oct 2021 10:17:29 +0000 (03:17 -0700)]
drm/i915: fix blank screen booting crashes
5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915. Bisections converge convincingly, but arrive at different
and suprising "culprits", none of them the actual culprit.
netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:
kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI:
ffffffff814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.
Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Fri, 1 Oct 2021 16:20:33 +0000 (19:20 +0300)]
ptp_pch: Load module automatically if ID matches
The driver can't be loaded automatically because it misses
module alias to be provided. Add corresponding MODULE_DEVICE_TABLE()
call to the driver.
Fixes: 863d08ece9bf ("supports eg20t ptp clock")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pali Rohár [Sat, 2 Oct 2021 09:04:09 +0000 (11:04 +0200)]
powerpc/fsl/dts: Fix phy-connection-type for fm1mac3
Property phy-connection-type contains invalid value "sgmii-2500" per scheme
defined in file ethernet-controller.yaml.
Correct phy-connection-type value should be "2500base-x".
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 84e0f1c13806 ("powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)")
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 2 Oct 2021 12:55:02 +0000 (13:55 +0100)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net (v2)
The following patchset contains Netfilter fixes for net:
1) Move back the defrag users fields to the global netns_nf area.
Kernel fails to boot if conntrack is builtin and kernel is booted
with: nf_conntrack.enable_hooks=1. From Florian Westphal.
2) Rule event notification is missing relevant context such as
the position handle and the NLM_F_APPEND flag.
3) Rule replacement is expanded to add + delete using the existing
rule handle, reverse order of this operation so it makes sense
from rule notification standpoint.
4) Propagate to userspace the NLM_F_CREATE and NLM_F_EXCL flags
from the rule notification path.
Patches #2, #3 and #4 are used by 'nft monitor' and 'iptables-monitor'
userspace utilities which are not correctly representing the following
operations through netlink notifications:
- rule insertions
- rule addition/insertion from position handle
- create table/chain/set/map/flowtable/...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nadezda Lutovinova [Tue, 21 Sep 2021 15:51:53 +0000 (18:51 +0300)]
hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
If driver read tmp value sufficient for
(tmp & 0x08) && (!(tmp & 0x80)) && ((tmp & 0x7) == ((tmp >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-3-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multi-line alignments]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Nadezda Lutovinova [Tue, 21 Sep 2021 15:51:52 +0000 (18:51 +0300)]
hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-2-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multipline alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Nadezda Lutovinova [Tue, 21 Sep 2021 15:51:51 +0000 (18:51 +0300)]
hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
If driver read val value sufficient for
(val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7))
from device then Null pointer dereference occurs.
(It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers)
Also lm75[] does not serve a purpose anymore after switching to
devm_i2c_new_dummy_device() in w83791d_detect_subclients().
The patch fixes possible NULL pointer dereference by removing lm75[].
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org
Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru>
Link: https://lore.kernel.org/r/20210921155153.28098-1-lutovinova@ispras.ru
[groeck: Dropped unnecessary continuation lines, fixed multi-line alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Vadim Pasternak [Mon, 27 Sep 2021 07:07:40 +0000 (10:07 +0300)]
hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller
Add missed attribute for reading POUT from page 1.
It is supported by device, but has been missed in initial commit.
Fixes: 2c6fcbb21149 ("hwmon: (pmbus) Add support for MPS Multi-phase mp2975 controller")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210927070740.2149290-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Brandon Wyman [Tue, 28 Sep 2021 20:50:51 +0000 (20:50 +0000)]
hwmon: (pmbus/ibm-cffps) max_power_out swap changes
The bytes for max_power_out from the ibm-cffps devices differ in byte
order for some power supplies.
The Witherspoon power supply returns the bytes in MSB/LSB order.
The Rainier power supply returns the bytes in LSB/MSB order.
The Witherspoon power supply uses version cffps1. The Rainier power
supply should use version cffps2. If version is cffps1, swap the bytes
before output to max_power_out.
Tested:
Witherspoon before: 3148. Witherspoon after: 3148.
Rainier before: 53255. Rainier after: 2000.
Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210928205051.1222815-1-bjwyman@gmail.com
[groeck: Replaced yoda programming]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Eddie James [Wed, 29 Sep 2021 15:36:04 +0000 (10:36 -0500)]
hwmon: (occ) Fix P10 VRM temp sensors
The P10 (temp sensor version 0x10) doesn't do the same VRM status
reporting that was used on P9. It just reports the temperature, so
drop the check for VRM fru type in the sysfs show function, and don't
set the name to "alarm".
Fixes: db4919ec86 ("hwmon: (occ) Add new temperature sensor type")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20210929153604.14968-1-eajames@linux.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pablo Neira Ayuso [Sun, 26 Sep 2021 07:59:35 +0000 (09:59 +0200)]
netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notification
Include the NLM_F_CREATE and NLM_F_EXCL flags in netlink event
notifications, otherwise userspace cannot distiguish between create and
add commands.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ilya Lipnitskiy [Thu, 30 Sep 2021 16:57:41 +0000 (09:57 -0700)]
MIPS: Revert "add support for buggy MT7621S core detection"
This reverts commit
6decd1aad15f56b169217789630a0098b496de0e. CPULAUNCH
register is not set properly by some bootloaders, causing a regression
until a bootloader change is made, which is hard if not impossible on
some embedded devices. Revert the change until a more robust core
detection mechanism that works on MT7621S routers such as Netgear R6220
as well as platforms like Digi EX15 can be made.
Link: https://lore.kernel.org/lkml/4d9e3b39-7caa-d372-5d7b-42dcec36fec7@kernel.org
Fixes: 6decd1aad15f ("MIPS: add support for buggy MT7621S core detection")
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Greg Ungerer <gerg@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Rob Herring [Tue, 28 Sep 2021 22:29:20 +0000 (17:29 -0500)]
dt-bindings: Drop more redundant 'maxItems/minItems'
Another round of removing redundant minItems/maxItems from new schema in
the recent merge window.
If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
same size as the list is redundant and can be dropped. Note that is DT
schema specific behavior and not standard json-schema behavior. The tooling
will fixup the final schema adding any unspecified minItems/maxItems.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Marek Vasut <marex@denx.de>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: dri-devel@lists.freedesktop.org
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210928222920.2204761-1-robh@kernel.org
Rob Herring [Fri, 20 Aug 2021 00:14:57 +0000 (19:14 -0500)]
dt-bindings: media: Fix more graph 'unevaluatedProperties' related warnings
The graph schema doesn't allow custom properties on endpoint nodes for
'#/properties/port' and '#/$defs/port-base' should be used instead. This
doesn't matter until 'unevaluatedProperties' support is implemented.
Cc: Dave Stevenson <dave.stevenson@raspberrypi.com>
Cc: Jacopo Mondi <jacopo@jmondi.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "Paul J. Murphy" <paul.j.murphy@intel.com>
Cc: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Cc: linux-media@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Link: https://lore.kernel.org/r/20210820001457.1705142-1-robh@kernel.org
Leon Romanovsky [Thu, 30 Sep 2021 05:12:43 +0000 (08:12 +0300)]
MAINTAINERS: Remove Bin Luo as his email bounces
The emails sent to luobin9@huawei.com bounce with error:
"Recipient address rejected: Failed recipient validation check."
So let's remove his entry and change the status of hinic driver till
someone in Huawei will step-in to maintain it again.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/045a32ccf394de66b7899c8b732f44dc5f4a1154.1632978665.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Punit Agrawal [Wed, 29 Sep 2021 13:50:49 +0000 (22:50 +0900)]
net: stmmac: dwmac-rk: Fix ethernet on rk3399 based devices
Commit
2d26f6e39afb ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
while getting rid of a runtime PM warning ended up breaking ethernet
on rk3399 based devices. By dropping an extra reference to the device,
the commit ends up enabling suspend / resume of the ethernet device -
which appears to be broken.
While the issue with runtime pm is being investigated, partially
revert commit
2d26f6e39afb to restore the network on rk3399.
Fixes: 2d26f6e39afb ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
Suggested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Cc: Michael Riesch <michael.riesch@wolfvision.net>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20210929135049.3426058-1-punitagrawal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 30 Sep 2021 12:53:30 +0000 (15:53 +0300)]
net: mscc: ocelot: fix VCAP filters remaining active after being deleted
When ocelot_flower.c calls ocelot_vcap_filter_add(), the filter has a
given filter->id.cookie. This filter is added to the block->rules list.
However, when ocelot_flower.c calls ocelot_vcap_block_find_filter_by_id()
which passes the cookie as argument, the filter is never found by
filter->id.cookie when searching through the block->rules list.
This is unsurprising, since the filter->id.cookie is an unsigned long,
but the cookie argument provided to ocelot_vcap_block_find_filter_by_id()
is a signed int, and the comparison fails.
Fixes: 50c6cc5b9283 ("net: mscc: ocelot: store a namespaced VCAP filter ID")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210930125330.2078625-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 30 Sep 2021 21:22:39 +0000 (14:22 -0700)]
net_sched: fix NULL deref in fifo_set_limit()
syzbot reported another NULL deref in fifo_set_limit() [1]
I could repro the issue with :
unshare -n
tc qd add dev lo root handle 1:0 tbf limit 200000 burst 70000 rate 100Mbit
tc qd replace dev lo parent 1:0 pfifo_fast
tc qd change dev lo root handle 1:0 tbf limit 300000 burst 70000 rate 100Mbit
pfifo_fast does not have a change() operation.
Make fifo_set_limit() more robust about this.
[1]
BUG: kernel NULL pointer dereference, address:
0000000000000000
PGD
1cf99067 P4D
1cf99067 PUD
7ca49067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 14443 Comm: syz-executor959 Not tainted 5.15.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
RSP: 0018:
ffffc9000e2f7310 EFLAGS:
00010246
RAX:
dffffc0000000000 RBX:
ffffffff8d6ecc00 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ffff888024c27910 RDI:
ffff888071e34000
RBP:
ffff888071e34000 R08:
0000000000000001 R09:
ffffffff8fcfb947
R10:
0000000000000001 R11:
0000000000000000 R12:
ffff888024c27910
R13:
ffff888071e34018 R14:
0000000000000000 R15:
ffff88801ef74800
FS:
00007f321d897700(0000) GS:
ffff8880b9d00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffffffffd6 CR3:
00000000722c3000 CR4:
00000000003506e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
fifo_set_limit net/sched/sch_fifo.c:242 [inline]
fifo_set_limit+0x198/0x210 net/sched/sch_fifo.c:227
tbf_change+0x6ec/0x16d0 net/sched/sch_tbf.c:418
qdisc_change net/sched/sch_api.c:1332 [inline]
tc_modify_qdisc+0xd9a/0x1a60 net/sched/sch_api.c:1634
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: fb0305ce1b03 ("net-sched: consolidate default fifo qdisc setup")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20210930212239.3430364-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 1 Oct 2021 21:45:23 +0000 (14:45 -0700)]
Merge tag 's390-5.15-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fix from Vasily Gorbik:
"One fix for 5.15-rc4: Avoid CIO excessive path-verification requests,
which might cause unwanted delays"
* tag 's390-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: avoid excessive path-verification requests
Andrii Nakryiko [Fri, 1 Oct 2021 18:59:10 +0000 (11:59 -0700)]
libbpf: Fix memory leak in strset
Free struct strset itself, not just its internal parts.
Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211001185910.86492-1-andrii@kernel.org
Eric Dumazet [Fri, 1 Oct 2021 16:46:22 +0000 (09:46 -0700)]
net: add kerneldoc comment for sk_peer_lock
Fixes following warning:
include/net/sock.h:533: warning: Function parameter or member 'sk_peer_lock' not described in 'sock'
Fixes: 35306eb23814 ("af_unix: fix races in sk_peer_pid and sk_peer_cred accesses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20211001164622.58520-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafael J. Wysocki [Fri, 1 Oct 2021 17:14:28 +0000 (19:14 +0200)]
thermal: Update information in MAINTAINERS
Because Rui is now going to focus on work that is not related to the
maintenance of the thermal subsystem in the kernel, Rafael will start
to help Daniel with handling the development process as a new member
of the thermal maintainers team. Rui will continue to review patches
in that area.
The thermal development process flow will change so that the material
from the thermal git tree will be merged into the thermal branch of
the linux-pm.git tree before going into the mainline.
Update the information in MAINTAINERS accordingly.
Signed-off-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 1 Oct 2021 18:08:07 +0000 (11:08 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull more kvm fixes from Paolo Bonzini:
"Small x86 fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Ensure all migrations are performed when test is affined
KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks
ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm
x86/kvmclock: Move this_cpu_pvti into kvmclock.h
selftests: KVM: Don't clobber XMM register when read
KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue
Linus Torvalds [Fri, 1 Oct 2021 17:27:44 +0000 (10:27 -0700)]
Merge tag 'drm-fixes-2021-10-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"Dave is out on a long w/e, should be back next week.
Nothing nefarious, just a bunch of driver fixes: amdgpu, i915, tegra,
and one exynos driver fix"
* tag 'drm-fixes-2021-10-01' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix
drm/amdgpu: check tiling flags when creating FB on GFX8-
drm/amd/display: Pass PCI deviceid into DC
drm/amd/display: initialize backlight_ramping_override to false
drm/amdgpu: correct initial cp_hqd_quantum for gfx9
drm/amd/display: Fix Display Flicker on embedded panels
drm/amdgpu: fix gart.bo pin_count leak
drm/i915: Remove warning from the rps worker
drm/i915/request: fix early tracepoints
drm/i915/guc, docs: Fix pdfdocs build error by removing nested grid
gpu: host1x: Plug potential memory leak
gpu/host1x: fence: Make spinlock static
drm/tegra: uapi: Fix wrong mapping end address in case of disabled IOMMU
drm/tegra: dc: Remove unused variables
drm/exynos: Make use of the helper function devm_platform_ioremap_resource()
drm/i915/gvt: fix the usage of ww lock in gvt scheduler.
Pavel Begunkov [Fri, 1 Oct 2021 09:39:33 +0000 (10:39 +0100)]
io_uring: kill fasync
We have never supported fasync properly, it would only fire when there
is something polling io_uring making it useless. The original support came
in through the initial io_uring merge for 5.1. Since it's broken and
nobody has reported it, get rid of the fasync bits.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 1 Oct 2021 17:14:29 +0000 (10:14 -0700)]
Merge tag 'iommu-fixes-v5.15-rc3' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Two fixes for the new Apple DART driver to fix a kernel panic and a
stale data usage issue
- Intel VT-d fix for how PCI device ids are printed
* tag 'iommu-fixes-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dart: Clear sid2group entry when a group is freed
iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses
iommu/dart: Remove iommu_flush_ops
Daniel Vetter [Fri, 1 Oct 2021 16:14:38 +0000 (18:14 +0200)]
Merge tag 'exynos-drm-fixes-for-v5.15-rc4' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
One cleanup
- Use devm_platform_ioremap_resource() helper function instead of old
one.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210928074158.2942-1-inki.dae@samsung.com
Daniel Vetter [Fri, 1 Oct 2021 14:59:21 +0000 (16:59 +0200)]
Merge tag 'amd-drm-fixes-5.15-2021-09-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.15-2021-09-29:
amdgpu:
- gart pin count fix
- eDP flicker fix
- GFX9 MQD fix
- Display fixes
- Tiling flags fix for pre-GFX9
- SDMA resume fix for S0ix
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210930023013.5207-1-alexander.deucher@amd.com
Daniel Vetter [Fri, 1 Oct 2021 14:47:18 +0000 (16:47 +0200)]
Merge tag 'drm-intel-fixes-2021-09-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.15-rc4:
- Fix GVT scheduler ww lock usage
- Fix pdfdocs documentation build
- Fix request early tracepoints
- Fix an invalid warning from rps worker
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87lf3ev44z.fsf@intel.com
David S. Miller [Fri, 1 Oct 2021 13:16:52 +0000 (14:16 +0100)]
Merge tag 'mlx5-fixes-2021-09-30' of git://git./linux/kernel/git/saeed/linux
mlx5-fixes-2021-09-30
David S. Miller [Fri, 1 Oct 2021 13:16:29 +0000 (14:16 +0100)]
Merge tag 'wireless-drivers-2021-10-01' of git://git./linux/kernel/git/kvalo/wireless-drivers
wireless-drivers fixes for v5.15
Second set of fixes for v5.15, nothing major this time. Most important
here are reverting a brcmfmac regression and a fix for an old rare
ath5k build error.
iwlwifi
* fixes to NULL dereference, off by one and missing unlock
* add support for Killer AX1650 on Dell XPS 15 (9510) laptop
ath5k
* build fix with LEDS=m
brcmfmac
* revert a regression causing BCM4359/9 devices stop working as access point
mwifiex
* fix clang warning about null pointer arithmetic
Peter Zijlstra [Mon, 20 Sep 2021 13:31:11 +0000 (15:31 +0200)]
sched: Always inline is_percpu_thread()
vmlinux.o: warning: objtool: check_preemption_disabled()+0x81: call to is_percpu_thread() leaves .noinstr.text section
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210928084218.063371959@infradead.org
Mel Gorman [Mon, 27 Sep 2021 11:46:35 +0000 (12:46 +0100)]
sched/fair: Null terminate buffer when updating tunable_scaling
This patch null-terminates the temporary buffer in sched_scaling_write()
so kstrtouint() does not return failure and checks the value is valid.
Before:
$ cat /sys/kernel/debug/sched/tunable_scaling
1
$ echo 0 > /sys/kernel/debug/sched/tunable_scaling
-bash: echo: write error: Invalid argument
$ cat /sys/kernel/debug/sched/tunable_scaling
1
After:
$ cat /sys/kernel/debug/sched/tunable_scaling
1
$ echo 0 > /sys/kernel/debug/sched/tunable_scaling
$ cat /sys/kernel/debug/sched/tunable_scaling
0
$ echo 3 > /sys/kernel/debug/sched/tunable_scaling
-bash: echo: write error: Invalid argument
Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210927114635.GH3959@techsingularity.net
Michal Koutný [Fri, 17 Sep 2021 15:30:37 +0000 (17:30 +0200)]
sched/fair: Add ancestors of unthrottled undecayed cfs_rq
Since commit
a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to
list on unthrottle") we add cfs_rqs with no runnable tasks but not fully
decayed into the load (leaf) list. We may ignore adding some ancestors
and therefore breaking tmp_alone_branch invariant. This broke LTP test
cfs_bandwidth01 and it was partially fixed in commit
fdaba61ef8a2
("sched/fair: Ensure that the CFS parent is added after unthrottling").
I noticed the named test still fails even with the fix (but with low
probability, 1 in ~1000 executions of the test). The reason is when
bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of
the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly.
Fix this by adding ancestors if we notice the unthrottled cfs_rq was
added to the load list.
Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Link: https://lore.kernel.org/r/20210917153037.11176-1-mkoutny@suse.com
Song Liu [Wed, 29 Sep 2021 19:43:13 +0000 (12:43 -0700)]
perf/core: fix userpage->time_enabled of inactive events
Users of rdpmc rely on the mmapped user page to calculate accurate
time_enabled. Currently, userpage->time_enabled is only updated when the
event is added to the pmu. As a result, inactive event (due to counter
multiplexing) does not have accurate userpage->time_enabled. This can
be reproduced with something like:
/* open 20 task perf_event "cycles", to create multiplexing */
fd = perf_event_open(); /* open task perf_event "cycles" */
userpage = mmap(fd); /* use mmap and rdmpc */
while (true) {
time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */
time_enabled_read = read(fd).time_enabled;
if (time_enabled_mmap > time_enabled_read)
BUG();
}
Fix this by updating userpage for inactive events in merge_sched_in.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-and-tested-by: Lucian Grijincu <lucian@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
Kan Liang [Tue, 28 Sep 2021 15:19:03 +0000 (08:19 -0700)]
perf/x86/intel: Update event constraints for ICX
According to the latest event list, the event encoding 0xEF is only
available on the first 4 counters. Add it into the event constraints
table.
Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com
Anand K Mistry [Wed, 29 Sep 2021 07:04:21 +0000 (17:04 +1000)]
perf/x86: Reset destroy callback on event init failure
perf_init_event tries multiple init callbacks and does not reset the
event state between tries. When x86_pmu_event_init runs, it
unconditionally sets the destroy callback to hw_perf_event_destroy. On
the next init attempt after x86_pmu_event_init, in perf_try_init_event,
if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy
callback will be run. However, if the next init didn't set the destroy
callback, hw_perf_event_destroy will be run (since the callback wasn't
reset).
Looking at other pmu init functions, the common pattern is to only set
the destroy callback on a successful init. Resetting the callback on
failure tries to replicate that pattern.
This was discovered after commit
f11dd0d80555 ("perf/x86/amd/ibs: Extend
PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second)
run of the perf tool after a reboot results in 0 samples being
generated. The extra run of hw_perf_event_destroy results in
active_events having an extra decrement on each perf run. The second run
has active_events == 0 and every subsequent run has active_events < 0.
When active_events == 0, the NMI handler will early-out and not record
any samples.
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid