Len Brown [Sun, 28 Apr 2024 02:15:48 +0000 (22:15 -0400)]
tools/power turbostat: version 2024.05.10
New since 2024.04.08:
Len Brown (6):
tools/power turbostat: Add "snapshot:" Makefile target
tools/power turbostat: Harden probe_intel_uncore_frequency()
tools/power turbostat: Remember global max_die_id
tools/power turbostat: Survive sparse die_id
tools/power turbostat: Add columns for clustered uncore frequency
tools/power turbostat: version 2024.05.10
Patryk Wlazlyn (7):
tools/power turbostat: Replace _Static_assert with BUILD_BUG_ON
tools/power turbostat: Enable non-privileged users to read sysfs counters
tools/power turbostat: Avoid possible memory corruption due to sparse topology IDs
tools/power turbostat: Read Core-cstates via perf
tools/power turbostat: Read Package-cstates via perf
tools/power turbostat: Fix order of strings in pkg_cstate_limit_strings
tools/power turbostat: Ignore pkg_cstate_limit when it is not available
Zhang Rui (2):
tools/power turbostat: Enhance ARL/LNL support
tools/power turbostat: Add ARL-H support
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Thu, 9 May 2024 10:39:47 +0000 (12:39 +0200)]
tools/power turbostat: Ignore pkg_cstate_limit when it is not available
When running in no-msr mode, the pkg_cstate_limit is not populated, thus
we use perf to determine if given pcstate counter is present on the
platform.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Thu, 9 May 2024 10:24:02 +0000 (12:24 +0200)]
tools/power turbostat: Fix order of strings in pkg_cstate_limit_strings
Change the order so that it matches the indexes defined in:
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Wed, 8 May 2024 13:00:14 +0000 (15:00 +0200)]
tools/power turbostat: Read Package-cstates via perf
Reading the counters via perf can be done in bulk with a single syscall,
making the counter values more accurate with respect to one another by
minimizing the time gap between individual counter reads.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Mon, 11 Mar 2024 17:06:16 +0000 (18:06 +0100)]
tools/power turbostat: Read Core-cstates via perf
Reading the counters via perf can be done in bulk with a single syscall,
making the counter values more accurate with respect to one another by
minimizing the time gap between individual counter reads.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Mon, 6 May 2024 13:39:08 +0000 (15:39 +0200)]
tools/power turbostat: Avoid possible memory corruption due to sparse topology IDs
Save the highest core and package id when parsing topology to
allocate enough memory when get_rapl_counters() is called with a core or
a package id as a domain.
Note that RAPL domains are per-package on Intel, but per-core on AMD.
Thus, the RAPL code effectively runs in different modes on those two
product lines.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 25 Apr 2024 01:12:18 +0000 (21:12 -0400)]
tools/power turbostat: Add columns for clustered uncore frequency
New machines have multiple uncore frequencies per package,
visible in /sys/devices/system/cpu/intel_uncore_frequency/uncore##/
turbostat now samples these frequencies each measurement interval.
For each package, turbostat now prints "UMHzX.Y" columns,
where X = domain_id, and Y = fabric_cluster_id.
The system summary for each UMHzX.Y column is the average value
for across all of the packages in the system.
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Thu, 25 Apr 2024 15:54:18 +0000 (17:54 +0200)]
tools/power turbostat: Enable non-privileged users to read sysfs counters
A group of counters called "sysfs" displays software
C-state request counts and resulting perceived C-state residency.
They are not built-in counters that turbostat knows about ahead of time,
rather they are discovered in sysfs when turbostat starts.
Thus, they are added dynamically, using the same interface
as user-added MSR counters.
When turbostat enters "no-msr" mode, such as when running as a
non-privileged user, it clears all added counters.
Updating that to clear only actual MSR added counters
allows regular users to see the sysfs counters.
[lenb: commit message]
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Tue, 23 Apr 2024 09:59:49 +0000 (11:59 +0200)]
tools/power turbostat: Replace _Static_assert with BUILD_BUG_ON
So it compiles on GCC older than 9.0.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Wed, 27 Mar 2024 14:36:38 +0000 (22:36 +0800)]
tools/power turbostat: Add ARL-H support
Add turbostat support for ARL-H, which behaves the same as ARL.
[lenb: also add ARL-U]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Wed, 27 Mar 2024 14:35:03 +0000 (22:35 +0800)]
tools/power turbostat: Enhance ARL/LNL support
ARL/LNL don't have PC8, other than that, it behaves the same as CNL.
Copy cnl_features for ARL/LNL, except that PC8 support is removed.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 21 Apr 2024 16:02:24 +0000 (12:02 -0400)]
tools/power turbostat: Survive sparse die_id
Turbostat assumed that every package had a die_id = 0.
When this assumption was violated, it exited
when looking for the package uncore frequency:
turbostat: /sys/.../intel_uncore_frequency/package_01_die_00/current_freq_khz: open failed: No such file or directory
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 21 Apr 2024 15:56:48 +0000 (11:56 -0400)]
tools/power turbostat: Remember global max_die_id
This is necessary to gracefully handle sparse die_id's.
no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 21 Apr 2024 18:45:10 +0000 (14:45 -0400)]
tools/power turbostat: Harden probe_intel_uncore_frequency()
If sysfs directory "intel_uncore_frequency/cluster00/" exists,
then use uncore cluster code (now its own routine).
The previous check for
"intel_uncore_frequency/package_00_die_00/current_freq_khz",
could be unreliable in the face of sparse die id's.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 21 Apr 2024 16:53:53 +0000 (12:53 -0400)]
tools/power turbostat: Add "snapshot:" Makefile target
Kernel developers often need to diagnose remote customer systems
with the latest turbostat, yet customers are running binary distros
with out-dated turbostat and the customer has no experience
cloning linux kernel trees.
Add a turbostat "snapshot" makefile target to create a standalone
source snapshot from the developer's git tree, appropriately hacked
so that the customer can build turbostat without a kernel tree.
Include the turbostat binary in the snapshot, for convenience in
those situations where the source and destination are trusted,
(and have new enough glibc to execute).
The snapshot is named with the date it was taken rather than
the turbostat VERSION, as it could occur between VERSIONS...
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Wed, 10 Apr 2024 20:13:27 +0000 (13:13 -0700)]
Merge tag 'turbostat-2024.04.10' of git://git./linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Use of the CPU MSR driver is now optional
- Perf is now preferred for many counters
- Non-root users can now execute turbostat, though with limited
functionality
- Add counters for some new GFX hardware
- Minor fixes
* tag 'turbostat-2024.04.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (26 commits)
tools/power turbostat: v2024.04.10
tools/power/turbostat: Add support for Xe sysfs knobs
tools/power/turbostat: Add support for new i915 sysfs knobs
tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz
tools/power/turbostat: Fix uncore frequency file string
tools/power/turbostat: Unify graphics sysfs snapshots
tools/power/turbostat: Cache graphics sysfs path
tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX
tools/power turbostat: Add selftests
tools/power turbostat: read RAPL counters via perf
tools/power turbostat: Add proper re-initialization for perf file descriptors
tools/power turbostat: Clear added counters when in no-msr mode
tools/power turbostat: add early exits for permission checks
tools/power turbostat: detect and disable unavailable BICs at runtime
tools/power turbostat: Add reading aperf and mperf via perf API
tools/power turbostat: Add --no-perf option
tools/power turbostat: Add --no-msr option
tools/power turbostat: enhance -D (debug counter dump) output
tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read
tools/power turbostat: Read base_hz and bclk from CPUID.16H if available
...
Linus Torvalds [Wed, 10 Apr 2024 20:10:22 +0000 (13:10 -0700)]
Merge tag 'platform-drivers-x86-v6.9-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes:
- intel/hid: Solve spurious hibernation aborts (power button release)
- toshiba_acpi: Ignore 2 keys to avoid log noise during
suspend/resume
- intel-vbtn: Fix probe by restoring VBDL and VGBS evalutation order
- lg-laptop: Fix W=1 %s null argument warning
New HW Support:
- acer-wmi: PH18-71 mode button and fan speed sensor
- intel/hid: Lunar Lake and Arrow Lake HID IDs"
* tag 'platform-drivers-x86-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: lg-laptop: fix %s null argument warning
platform/x86: intel-vbtn: Update tablet mode switch at end of probe
platform/x86: intel-vbtn: Use acpi_has_method to check for switch
platform/x86: toshiba_acpi: Silence logging for some events
platform/x86/intel/hid: Add Lunar Lake and Arrow Lake support
platform/x86/intel/hid: Don't wake on 5-button releases
platform/x86: acer-wmi: Add support for Acer PH18-71
Len Brown [Mon, 8 Apr 2024 23:32:58 +0000 (19:32 -0400)]
tools/power turbostat: v2024.04.10
Much of turbostat can now run with perf, rather than using the MSR driver
Some of turbostat can now run as a regular non-root user.
Add some new output columns for some new GFX hardware.
[This patch updates the version, but otherwise changes no function;
it touches up some checkpatch issues from previous patches]
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Tue, 12 Mar 2024 15:56:02 +0000 (23:56 +0800)]
tools/power/turbostat: Add support for Xe sysfs knobs
Xe graphics driver uses different graphics sysfs knobs including
/sys/class/drm/card0/device/tile0/gt0/gtidle/idle_residency_ms
/sys/class/drm/card0/device/tile0/gt0/freq0/cur_freq
/sys/class/drm/card0/device/tile0/gt0/freq0/act_freq
/sys/class/drm/card0/device/tile0/gt1/gtidle/idle_residency_ms
/sys/class/drm/card0/device/tile0/gt1/freq0/cur_freq
/sys/class/drm/card0/device/tile0/gt1/freq0/act_freq
Plus that,
/sys/class/drm/card0/device/tile0/gt<n>/gtidle/name
returns either gt<n>-rc or gt<n>-mc. rc is for GFX and mc is SA Media.
Enhance turbostat to prefer the Xe sysfs knobs when they are available.
Export gt<n>-rc via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz.
Export gt<n>-mc via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Zhang Rui [Fri, 22 Mar 2024 01:52:24 +0000 (09:52 +0800)]
tools/power/turbostat: Add support for new i915 sysfs knobs
On Meteorlake platform, i915 driver supports the traditional graphics
sysfs knobs including
/sys/class/drm/card0/power/rc6_residency_ms
/sys/class/drm/card0/gt_cur_freq_mhz
/sys/class/drm/card0/gt_act_freq_mhz
At the same time, it also supports
/sys/class/drm/card0/gt/gt0/rc6_residency_ms
/sys/class/drm/card0/gt/gt0/rps_cur_freq_mhz
/sys/class/drm/card0/gt/gt0/rps_act_freq_mhz
/sys/class/drm/card0/gt/gt1/rc6_residency_ms
/sys/class/drm/card0/gt/gt1/rps_cur_freq_mhz
/sys/class/drm/card0/gt/gt1/rps_act_freq_mhz
gt0 is for GFX and gt1 is for SA Media.
Enhance turbostat to prefer the i915 new sysfs knobs.
Export gt0 via BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz.
Export gt1 via BIC_SMA_mc6/BIC_SMAMHz/BIC_SMAACTMHz.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Zhang Rui [Wed, 13 Mar 2024 02:30:04 +0000 (10:30 +0800)]
tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz
Graphics driver (i915/Xe) on mordern platforms splits GFX and SA Media
information via different sysfs knobs.
Existing BIC_GFX_rc6/BIC_GFXMHz/BIC_GFXACTMHz columns can be reused for
GFX.
Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz columns for SA Media.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Justin Ernst [Tue, 2 Apr 2024 17:40:29 +0000 (13:40 -0400)]
tools/power/turbostat: Fix uncore frequency file string
Running turbostat on a 16 socket HPE Scale-up Compute 3200 (SapphireRapids) fails with:
turbostat: /sys/devices/system/cpu/intel_uncore_frequency/package_010_die_00/current_freq_khz: open failed: No such file or directory
We observe the sysfs uncore frequency directories named:
...
package_09_die_00/
package_10_die_00/
package_11_die_00/
...
package_15_die_00/
The culprit is an incorrect sprintf format string "package_0%d_die_0%d" used
with each instance of reading uncore frequency files. uncore-frequency-common.c
creates the sysfs directory with the format "package_%02d_die_%02d". Once the
package value reaches double digits, the formats diverge.
Change each instance of "package_0%d_die_0%d" to "package_%02d_die_%02d".
[lenb: deleted the probe part of this patch, as it was already fixed]
Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Wed, 13 Mar 2024 02:12:19 +0000 (10:12 +0800)]
tools/power/turbostat: Unify graphics sysfs snapshots
Graphics sysfs snapshots share similar logic.
Combine them into one function to avoid code duplication.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Tue, 12 Mar 2024 06:23:37 +0000 (14:23 +0800)]
tools/power/turbostat: Cache graphics sysfs path
Graphics drivers (i915/Xe) have different sysfs knobs on different
platforms, and it is possible that different sysfs knobs fit into the
same turbostat columns.
Instead of specifying different sysfs knobs every time, detect them
once and cache the path for future use.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Tue, 12 Mar 2024 03:19:15 +0000 (11:19 +0800)]
tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX
Enable Core C1 hardware residency counter (MSR_CORE_C1_RES) on ICX.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Thu, 7 Mar 2024 16:00:35 +0000 (17:00 +0100)]
tools/power turbostat: Add selftests
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Patryk Wlazlyn [Tue, 5 Mar 2024 11:27:27 +0000 (12:27 +0100)]
tools/power turbostat: read RAPL counters via perf
Some of the future Intel platforms will require reading the RAPL
counters via perf and not MSR. On current platforms we can still read
them using both ways.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Tue, 9 Apr 2024 16:24:37 +0000 (09:24 -0700)]
Merge tag 'drm-fixes-2024-04-09' of https://gitlab.freedesktop.org/drm/kernel
Pull drm nouveau fix from Dave Airlie:
"A previous fix to nouveau devinit on the GSP paths fixed the Turing
but broke Ampere, I did some more digging and found the proper fix.
Sending it early as I want to make sure it makes the next 6.8 stable
kernels to fix the regression.
Regular fixes will be at end of week as usual.
nouveau:
- regression fix for GSP display enable"
* tag 'drm-fixes-2024-04-09' of https://gitlab.freedesktop.org/drm/kernel:
nouveau: fix devinit paths to only handle display on GSP.
Thorsten Blum [Tue, 9 Apr 2024 15:46:23 +0000 (17:46 +0200)]
compiler.h: Add missing quote in macro comment
Add a missing doublequote in the __is_constexpr() macro comment.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Mon, 8 Apr 2024 06:42:43 +0000 (16:42 +1000)]
nouveau: fix devinit paths to only handle display on GSP.
This reverts:
nouveau/gsp: don't check devinit disable on GSP.
and applies a further fix.
It turns out the open gpu driver, checks this register,
but only for display.
Match that behaviour and in the turing path only disable
the display block. (ampere already only does displays).
Fixes: 5d4e8ae6e57b ("nouveau/gsp: don't check devinit disable on GSP.")
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240408064243.2219527-1-airlied@gmail.com
Linus Torvalds [Tue, 9 Apr 2024 03:07:51 +0000 (20:07 -0700)]
Merge tag 'nativebhi' of git://git./linux/kernel/git/tip/tip
Pull x86 mitigations from Thomas Gleixner:
"Mitigations for the native BHI hardware vulnerabilty:
Branch History Injection (BHI) attacks may allow a malicious
application to influence indirect branch prediction in kernel by
poisoning the branch history. eIBRS isolates indirect branch targets
in ring0. The BHB can still influence the choice of indirect branch
predictor entry, and although branch predictor entries are isolated
between modes when eIBRS is enabled, the BHB itself is not isolated
between modes.
Add mitigations against it either with the help of microcode or with
software sequences for the affected CPUs"
[ This also ends up enabling the full mitigation by default despite the
system call hardening, because apparently there are other indirect
calls that are still sufficiently reachable, and the 'auto' case just
isn't hardened enough.
We'll have some more inevitable tweaking in the future - Linus ]
* tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM: x86: Add BHI_NO
x86/bhi: Mitigate KVM by default
x86/bhi: Add BHI mitigation knob
x86/bhi: Enumerate Branch History Injection (BHI) bug
x86/bhi: Define SPEC_CTRL_BHI_DIS_S
x86/bhi: Add support for clearing branch history at syscall entry
x86/syscall: Don't force use of indirect calls for system calls
x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
Linus Torvalds [Mon, 8 Apr 2024 20:11:11 +0000 (13:11 -0700)]
Merge tag 'for-6.9-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several fixes to qgroups that have been recently identified by test
generic/475:
- fix prealloc reserve leak in subvolume operations
- various other fixes in reservation setup, conversion or cleanup"
* tag 'for-6.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: always clear PERTRANS metadata during commit
btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve
btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
btrfs: record delayed inode root in transaction
btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations
btrfs: qgroup: correctly model root qgroup rsv in convert
Daniel Sneddon [Wed, 13 Mar 2024 16:49:17 +0000 (09:49 -0700)]
KVM: x86: Add BHI_NO
Intel processors that aren't vulnerable to BHI will set
MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to
determine if they need to implement BHI mitigations or not. Allow this bit
to be passed to the guests.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Pawan Gupta [Mon, 11 Mar 2024 15:57:09 +0000 (08:57 -0700)]
x86/bhi: Mitigate KVM by default
BHI mitigation mode spectre_bhi=auto does not deploy the software
mitigation by default. In a cloud environment, it is a likely scenario
where userspace is trusted but the guests are not trusted. Deploying
system wide mitigation in such cases is not desirable.
Update the auto mode to unconditionally mitigate against malicious
guests. Deploy the software sequence at VMexit in auto mode also, when
hardware mitigation is not available. Unlike the force =on mode,
software sequence is not deployed at syscalls in auto mode.
Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Pawan Gupta [Mon, 11 Mar 2024 15:57:05 +0000 (08:57 -0700)]
x86/bhi: Add BHI mitigation knob
Branch history clearing software sequences and hardware control
BHI_DIS_S were defined to mitigate Branch History Injection (BHI).
Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation:
auto - Deploy the hardware mitigation BHI_DIS_S, if available.
on - Deploy the hardware mitigation BHI_DIS_S, if available,
otherwise deploy the software sequence at syscall entry and
VMexit.
off - Turn off BHI mitigation.
The default is auto mode which does not deploy the software sequence
mitigation. This is because of the hardening done in the syscall
dispatch path, which is the likely target of BHI.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Pawan Gupta [Mon, 11 Mar 2024 15:57:03 +0000 (08:57 -0700)]
x86/bhi: Enumerate Branch History Injection (BHI) bug
Mitigation for BHI is selected based on the bug enumeration. Add bits
needed to enumerate BHI bug.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Daniel Sneddon [Wed, 13 Mar 2024 16:47:57 +0000 (09:47 -0700)]
x86/bhi: Define SPEC_CTRL_BHI_DIS_S
Newer processors supports a hardware control BHI_DIS_S to mitigate
Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel
from userspace BHI attacks without having to manually overwrite the
branch history.
Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL.
Mitigation is enabled later.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Pawan Gupta [Mon, 11 Mar 2024 15:56:58 +0000 (08:56 -0700)]
x86/bhi: Add support for clearing branch history at syscall entry
Branch History Injection (BHI) attacks may allow a malicious application to
influence indirect branch prediction in kernel by poisoning the branch
history. eIBRS isolates indirect branch targets in ring0. The BHB can
still influence the choice of indirect branch predictor entry, and although
branch predictor entries are isolated between modes when eIBRS is enabled,
the BHB itself is not isolated between modes.
Alder Lake and new processors supports a hardware control BHI_DIS_S to
mitigate BHI. For older processors Intel has released a software sequence
to clear the branch history on parts that don't support BHI_DIS_S. Add
support to execute the software sequence at syscall entry and VMexit to
overwrite the branch history.
For now, branch history is not cleared at interrupt entry, as malicious
applications are not believed to have sufficient control over the
registers, since previous register state is cleared at interrupt
entry. Researchers continue to poke at this area and it may become
necessary to clear at interrupt entry as well in the future.
This mitigation is only defined here. It is enabled later.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Co-developed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Linus Torvalds [Wed, 3 Apr 2024 23:36:44 +0000 (16:36 -0700)]
x86/syscall: Don't force use of indirect calls for system calls
Make <asm/syscall.h> build a switch statement instead, and the compiler can
either decide to generate an indirect jump, or - more likely these days due
to mitigations - just a series of conditional branches.
Yes, the conditional branches also have branch prediction, but the branch
prediction is much more controlled, in that it just causes speculatively
running the wrong system call (harmless), rather than speculatively running
possibly wrong random less controlled code gadgets.
This doesn't mitigate other indirect calls, but the system call indirection
is the first and most easily triggered case.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Josh Poimboeuf [Fri, 5 Apr 2024 18:14:13 +0000 (11:14 -0700)]
x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
Change the format of the 'spectre_v2' vulnerabilities sysfs file
slightly by converting the commas to semicolons, so that mitigations for
future variants can be grouped together and separated by commas.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Mon, 8 Apr 2024 17:11:37 +0000 (10:11 -0700)]
Merge tag 'fixes-2024-04-08' of git://git./linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
"Fix build errors in memblock tests:
- add stubs to functions that calls to them were recently added to
memblock but they were missing in tests
- update gfp_types.h to include bits.h so that BIT() definitions
won't depend on other includes"
* tag 'fixes-2024-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: fix undefined reference to `BIT'
memblock tests: fix undefined reference to `panic'
memblock tests: fix undefined reference to `early_pfn_to_nid'
Gergo Koteles [Wed, 3 Apr 2024 14:34:27 +0000 (16:34 +0200)]
platform/x86: lg-laptop: fix %s null argument warning
W=1 warns about null argument to kprintf:
warning: ‘%s’ directive argument is null [-Wformat-overflow=]
pr_info("product: %s year: %d\n", product, year);
Use "unknown" instead of NULL.
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/33d40e976f08f82b9227d0ecae38c787fcc0c0b2.1712154684.git.soyer@irl.hu
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Gwendal Grignou [Fri, 29 Mar 2024 14:32:06 +0000 (07:32 -0700)]
platform/x86: intel-vbtn: Update tablet mode switch at end of probe
ACER Vivobook Flip (TP401NAS) virtual intel switch is implemented as
follow:
Device (VGBI)
{
Name (_HID, EisaId ("INT33D6") ...
Name (VBDS, Zero)
Method (_STA, 0, Serialized) // _STA: Status ...
Method (VBDL, 0, Serialized)
{
PB1E |= 0x20
VBDS |= 0x40
}
Method (VGBS, 0, Serialized)
{
Return (VBDS) /* \_SB_.PCI0.SBRG.EC0_.VGBI.VBDS */
}
...
}
By default VBDS is set to 0. At boot it is set to clamshell (bit 6 set)
only after method VBDL is executed.
Since VBDL is now evaluated in the probe routine later, after the device
is registered, the retrieved value of VBDS was still 0 ("tablet mode")
when setting up the virtual switch.
Make sure to evaluate VGBS after VBDL, to ensure the
convertible boots in clamshell mode, the expected default.
Fixes: 26173179fae1 ("platform/x86: intel-vbtn: Eval VBDL after registering our notifier")
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240329143206.2977734-3-gwendal@chromium.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Gwendal Grignou [Fri, 29 Mar 2024 14:32:05 +0000 (07:32 -0700)]
platform/x86: intel-vbtn: Use acpi_has_method to check for switch
The check for a device having virtual buttons is done using
acpi_has_method(..."VBDL"). Mimic that for checking virtual switch
presence.
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240329143206.2977734-2-gwendal@chromium.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Hans de Goede [Tue, 2 Apr 2024 12:43:51 +0000 (14:43 +0200)]
platform/x86: toshiba_acpi: Silence logging for some events
Stop logging unknown event / unknown keycode messages on suspend /
resume on a Toshiba Portege Z830:
1. The Toshiba Portege Z830 sends a 0x8e event when the power button
is pressed, ignore this.
2. The Toshiba Portege Z830 sends a 0xe00 hotkey event on resume from
suspend, ignore this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240402124351.167152-1-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Sumeet Pawnikar [Fri, 5 Apr 2024 12:26:30 +0000 (17:56 +0530)]
platform/x86/intel/hid: Add Lunar Lake and Arrow Lake support
Add INTC107B for Lunar Lake and INTC10CB for Arrow Lake ACPI devices IDs.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Link: https://lore.kernel.org/r/20240405122630.32154-1-sumeet.r.pawnikar@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
David McFarland [Thu, 4 Apr 2024 11:41:45 +0000 (08:41 -0300)]
platform/x86/intel/hid: Don't wake on 5-button releases
If, for example, the power button is configured to suspend, holding it
and releasing it after the machine has suspended, will wake the machine.
Also on some machines, power button release events are sent during
hibernation, even if the button wasn't used to hibernate the machine.
This causes hibernation to be aborted.
Fixes: 0c4cae1bc00d ("PM: hibernate: Avoid missing wakeup events during hibernation")
Signed-off-by: David McFarland <corngood@gmail.com>
Tested-by: Enrik Berkhan <Enrik.Berkhan@inka.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/878r1tpd6u.fsf_-_@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Bernhard Rosenkränzer [Fri, 29 Mar 2024 15:28:00 +0000 (16:28 +0100)]
platform/x86: acer-wmi: Add support for Acer PH18-71
Add Acer Predator PH18-71 to acer_quirks with predator_v4
to support mode button and fan speed sensor.
Signed-off-by: Bernhard Rosenkränzer <bero@baylibre.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240329152800.29393-1-bero@baylibre.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Linus Torvalds [Sun, 7 Apr 2024 20:22:46 +0000 (13:22 -0700)]
Linux 6.9-rc3
Linus Torvalds [Sun, 7 Apr 2024 16:33:21 +0000 (09:33 -0700)]
Merge tag 'x86-urgent-2024-04-07' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix MCE timer reinit locking
- Fix/improve CoCo guest random entropy pool init
- Fix SEV-SNP late disable bugs
- Fix false positive objtool build warning
- Fix header dependency bug
- Fix resctrl CPU offlining bug
* tag 'x86-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
x86/CPU/AMD: Track SNP host status with cc_platform_*()
x86/cc: Add cc_platform_set/_clear() helpers
x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
x86/coco: Require seeding RNG with RDRAND on CoCo systems
x86/numa/32: Include missing <asm/pgtable_areas.h>
x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline
Linus Torvalds [Sun, 7 Apr 2024 16:20:50 +0000 (09:20 -0700)]
Merge tag 'timers-urgent-2024-04-07' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Fix various timer bugs:
- Fix a timer migration bug that may result in missed events
- Fix timer migration group hierarchy event updates
- Fix a PowerPC64 build warning
- Fix a handful of DocBook annotation bugs"
* tag 'timers-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Return early on deactivation
timers/migration: Fix ignored event due to missing CPU update
vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h
timers: Fix text inconsistencies and spelling
tick/sched: Fix struct tick_sched doc warnings
tick/sched: Fix various kernel-doc warnings
timers: Fix kernel-doc format and add Return values
time/timekeeping: Fix kernel-doc warnings and typos
time/timecounter: Fix inline documentation
Linus Torvalds [Sun, 7 Apr 2024 16:14:46 +0000 (09:14 -0700)]
Merge tag 'perf-urgent-2024-04-07' of git://git./linux/kernel/git/tip/tip
Pull x86 perf fix from Ingo Molnar:
"Fix a combined PEBS events bug on x86 Intel CPUs"
* tag 'perf-urgent-2024-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
Linus Torvalds [Sat, 6 Apr 2024 16:37:50 +0000 (09:37 -0700)]
Merge tag 'nfsd-6.9-2' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Address a slow memory leak with RPC-over-TCP
- Prevent another NFS4ERR_DELAY loop during CREATE_SESSION
* tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
Linus Torvalds [Sat, 6 Apr 2024 16:27:36 +0000 (09:27 -0700)]
Merge tag 'i2c-for-6.9-rc3' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"A host driver build fix"
* tag 'i2c-for-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: pxa: hide unused icr_bits[] variable
Linus Torvalds [Sat, 6 Apr 2024 16:14:18 +0000 (09:14 -0700)]
Merge tag 'xfs-6.9-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fix from Chandan Babu:
- Allow creating new links to special files which were not associated
with a project quota
* tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: allow cross-linking special files without project quota
Linus Torvalds [Sat, 6 Apr 2024 16:06:17 +0000 (09:06 -0700)]
Merge tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix to retry close to avoid potential handle leaks when server
returns EBUSY
- DFS fixes including a fix for potential use after free
- fscache fix
- minor strncpy cleanup
- reconnect race fix
- deal with various possible UAF race conditions tearing sessions down
* tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
smb: client: fix potential UAF in smb2_is_network_name_deleted()
smb: client: fix potential UAF in is_valid_oplock_break()
smb: client: fix potential UAF in smb2_is_valid_oplock_break()
smb: client: fix potential UAF in smb2_is_valid_lease_break()
smb: client: fix potential UAF in cifs_stats_proc_show()
smb: client: fix potential UAF in cifs_stats_proc_write()
smb: client: fix potential UAF in cifs_dump_full_key()
smb: client: fix potential UAF in cifs_debug_files_proc_show()
smb3: retrying on failed server close
smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex
smb: client: handle DFS tcons in cifs_construct_tcon()
smb: client: refresh referral without acquiring refpath_lock
smb: client: guarantee refcounted children from parent session
cifs: Fix caching to try to do open O_WRONLY as rdwr on server
smb: client: fix UAF in smb2_reconnect_server()
smb: client: replace deprecated strncpy with strscpy
Borislav Petkov (AMD) [Fri, 5 Apr 2024 14:46:37 +0000 (16:46 +0200)]
x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
srso_alias_untrain_ret() is special code, even if it is a dummy
which is called in the !SRSO case, so annotate it like its real
counterpart, to address the following objtool splat:
vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0
Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
Ingo Molnar [Sat, 6 Apr 2024 11:00:32 +0000 (13:00 +0200)]
Merge branch 'linus' into x86/urgent, to pick up dependent commit
We want to fix:
0e110732473e ("x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO")
So merge in Linus's latest into x86/urgent to have it available.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Wolfram Sang [Sat, 6 Apr 2024 09:29:15 +0000 (11:29 +0200)]
Merge tag 'i2c-host-fixes-6.9-rc3' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current
An unused const variable kind of error has been fixed by placing
the definition of icr_bits[] inside the ifdef block where it is
used.
Linus Torvalds [Sat, 6 Apr 2024 04:25:31 +0000 (21:25 -0700)]
Merge tag 'firewire-fixes-6.9-rc2' of git://git./linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:
"The firewire-ohci kernel module has a parameter for verbose kernel
logging. It is well-known that it logs the spurious IRQ for bus-reset
event due to the unmasked register for IRQ event. This update fixes
the issue"
* tag 'firewire-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: mask bus reset interrupts between ISR and bottom half
Adam Goldman [Sun, 24 Mar 2024 22:38:41 +0000 (07:38 +0900)]
firewire: ohci: mask bus reset interrupts between ISR and bottom half
In the FireWire OHCI interrupt handler, if a bus reset interrupt has
occurred, mask bus reset interrupts until bus_reset_work has serviced and
cleared the interrupt.
Normally, we always leave bus reset interrupts masked. We infer the bus
reset from the self-ID interrupt that happens shortly thereafter. A
scenario where we unmask bus reset interrupts was introduced in 2008 in
a007bb857e0b26f5d8b73c2ff90782d9c0972620: If
OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we
will unmask bus reset interrupts so we can log them.
irq_handler logs the bus reset interrupt. However, we can't clear the bus
reset event flag in irq_handler, because we won't service the event until
later. irq_handler exits with the event flag still set. If the
corresponding interrupt is still unmasked, the first bus reset will
usually freeze the system due to irq_handler being called again each
time it exits. This freeze can be reproduced by loading firewire_ohci
with "modprobe firewire_ohci debug=-1" (to enable all debugging output).
Apparently there are also some cases where bus_reset_work will get called
soon enough to clear the event, and operation will continue normally.
This freeze was first reported a few months after
a007bb85 was committed,
but until now it was never fixed. The debug level could safely be set
to -1 through sysfs after the module was loaded, but this would be
ineffectual in logging bus reset interrupts since they were only
unmasked during initialization.
irq_handler will now leave the event flag set but mask bus reset
interrupts, so irq_handler won't be called again and there will be no
freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will
unmask the interrupt after servicing the event, so future interrupts
will be caught as desired.
As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be
enabled through sysfs in addition to during initial module loading.
However, when enabled through sysfs, logging of bus reset interrupts will
be effective only starting with the second bus reset, after
bus_reset_work has executed.
Signed-off-by: Adam Goldman <adamg@pobox.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Linus Torvalds [Sat, 6 Apr 2024 00:26:43 +0000 (17:26 -0700)]
Merge tag 'spi-fix-v6.9-rc2' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes, the most important being the
s3c64xx change which is likely to be hit during normal operation"
* tag 'spi-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: mchp-pci1xxx: Fix a possible null pointer dereference in pci1xxx_spi_probe
spi: spi-fsl-lpspi: remove redundant spi_controller_put call
spi: s3c64xx: Use DMA mode from fifo size
Linus Torvalds [Sat, 6 Apr 2024 00:24:04 +0000 (17:24 -0700)]
Merge tag 'regulator-fix-v6.9-rc2' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One simple regualtor fix, fixing module autoloading on tps65132"
* tag 'regulator-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: tps65132: Add of_match table
Linus Torvalds [Sat, 6 Apr 2024 00:21:16 +0000 (17:21 -0700)]
Merge tag 'regmap-fix-v6.9-rc2' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"Richard found a nasty corner case in the maple tree code which he
fixed, and also fixed a compiler warning which was showing up with the
toolchain he uses and helpfully identified a possible incorrect error
code which could have runtime impacts"
* tag 'regmap-fix-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: maple: Fix uninitialized symbol 'ret' warnings
regmap: maple: Fix cache corruption in regcache_maple_drop()
Linus Torvalds [Sat, 6 Apr 2024 00:04:11 +0000 (17:04 -0700)]
Merge tag 'block-6.9-
20240405' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Atomic queue limits fixes (Christoph)
- Fabrics fixes (Hannes, Daniel)
- Discard overflow fix (Li)
- Cleanup fix for null_blk (Damien)
* tag 'block-6.9-
20240405' of git://git.kernel.dk/linux:
nvme-fc: rename free_ctrl callback to match name pattern
nvmet-fc: move RCU read lock to nvmet_fc_assoc_exists
nvmet: implement unique discovery NQN
nvme: don't create a multipath node for zero capacity devices
nvme: split nvme_update_zone_info
nvme-multipath: don't inherit LBA-related fields for the multipath node
block: fix overflow in blk_ioctl_discard()
nullblk: Fix cleanup order in null_add_dev() error path
Linus Torvalds [Fri, 5 Apr 2024 23:58:52 +0000 (16:58 -0700)]
Merge tag 'io_uring-6.9-
20240405' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Backport of some fixes that came up during development of the 6.10
io_uring patches. This includes some kbuf cleanups and reference
fixes.
- Disable multishot read if we don't have NOWAIT support on the target
- Fix for a dependency issue with workqueue flushing
* tag 'io_uring-6.9-
20240405' of git://git.kernel.dk/linux:
io_uring/kbuf: hold io_buffer_list reference over mmap
io_uring/kbuf: protect io_buffer_list teardown with a reference
io_uring/kbuf: get rid of bl->is_ready
io_uring/kbuf: get rid of lower BGID lists
io_uring: use private workqueue for exit work
io_uring: disable io-wq execution of multishot NOWAIT requests
io_uring/rw: don't allow multishot reads without NOWAIT support
Linus Torvalds [Fri, 5 Apr 2024 23:54:54 +0000 (16:54 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The most important is the libsas fix, which is a problem for DMA to a
kmalloc'd structure too small causing cache line interference. The
other fixes (all in drivers) are mostly for allocation length fixes,
error leg unwinding, suspend races and a missing retry"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix MCQ mode dev command timeout
scsi: libsas: Align SMP request allocation to ARCH_DMA_MINALIGN
scsi: sd: Unregister device if device_add_disk() failed in sd_probe()
scsi: ufs: core: WLUN suspend dev/link state error recovery
scsi: mylex: Fix sysfs buffer lengths
Linus Torvalds [Fri, 5 Apr 2024 21:07:22 +0000 (14:07 -0700)]
Merge tag 'devicetree-fixes-for-6.9-1' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix NIOS2 boot with external DTB
- Add missing synchronization needed between fw_devlink and DT overlay
removals
- Fix some unit-address regex's to be hex only
- Drop some 10+ year old "unstable binding" statements
- Add new SoCs to QCom UFS binding
- Add TPM bindings to TPM maintainers
* tag 'devicetree-fixes-for-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
nios2: Only use built-in devicetree blob if configured to do so
dt-bindings: timer: narrow regex for unit address to hex numbers
dt-bindings: soc: fsl: narrow regex for unit address to hex numbers
dt-bindings: remoteproc: ti,davinci: remove unstable remark
dt-bindings: clock: ti: remove unstable remark
dt-bindings: clock: keystone: remove unstable remark
of: module: prevent NULL pointer dereference in vsnprintf()
dt-bindings: ufs: qcom: document SM6125 UFS
dt-bindings: ufs: qcom: document SC7180 UFS
dt-bindings: ufs: qcom: document SC8180X UFS
of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
driver core: Introduce device_link_wait_removal()
docs: dt-bindings: add missing address/size-cells to example
MAINTAINERS: Add TPM DT bindings to TPM maintainers
Linus Torvalds [Fri, 5 Apr 2024 20:30:01 +0000 (13:30 -0700)]
Merge tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"8 hotfixes, 3 are cc:stable
There are a couple of fixups for this cycle's vmalloc changes and one
for the stackdepot changes. And a fix for a very old x86 PAT issue
which can cause a warning splat"
* tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
stackdepot: rename pool_index to pool_index_plus_1
x86/mm/pat: fix VM_PAT handling in COW mappings
MAINTAINERS: change vmware.com addresses to broadcom.com
selftests/mm: include strings.h for ffsl
mm: vmalloc: fix lockdep warning
mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
init: open output files from cpio unpacking with O_LARGEFILE
mm/secretmem: fix GUP-fast succeeding on secretmem folios
Linus Torvalds [Fri, 5 Apr 2024 20:12:35 +0000 (13:12 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"arm64/ptrace fix to use the correct SVE layout based on the saved
floating point state rather than the TIF_SVE flag. The latter may be
left on during syscalls even if the SVE state is discarded"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/ptrace: Use saved floating point state type to determine SVE layout
Linus Torvalds [Fri, 5 Apr 2024 20:09:48 +0000 (13:09 -0700)]
Merge tag 'riscv-for-linus-6.9-rc3' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for an __{get,put}_kernel_nofault to avoid an uninitialized
value causing spurious failures
- compat_vdso.so.dbg is now installed to the standard install location
- A fix to avoid initializing PERF_SAMPLE_BRANCH_*-related events, as
they aren't supported and will just later fail
- A fix to make AT_VECTOR_SIZE_ARCH correct now that we're providing
AT_MINSIGSTKSZ
- pgprot_nx() is now implemented, which fixes vmap W^X protection
- A fix for the vector save/restore code, which at least manifests as
corrupted vector state when a signal is taken
- A fix for a race condition in instruction patching
- A fix to avoid leaking the kernel-mode GP to userspace, which is a
kernel pointer leak that can be used to defeat KASLR in various ways
- A handful of smaller fixes to build warnings, an overzealous printk,
and some missing tracing annotations
* tag 'riscv-for-linus-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: process: Fix kernel gp leakage
riscv: Disable preemption when using patch_map()
riscv: Fix warning by declaring arch_cpu_idle() as noinstr
riscv: use KERN_INFO in do_trap
riscv: Fix vector state restore in rt_sigreturn()
riscv: mm: implement pgprot_nx
riscv: compat_vdso: align VDSOAS build log
RISC-V: Update AT_VECTOR_SIZE_ARCH for new AT_MINSIGSTKSZ
riscv: Mark __se_sys_* functions __used
drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported
riscv: compat_vdso: install compat_vdso.so.dbg to /lib/modules/*/vdso/
riscv: hwprobe: do not produce frtace relocation
riscv: Fix spurious errors from __get/put_kernel_nofault
riscv: mm: Fix prototype to avoid discarding const
Linus Torvalds [Fri, 5 Apr 2024 20:07:25 +0000 (13:07 -0700)]
Merge tag 's390-6.9-3' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix missing NULL pointer check when determining guest/host fault
- Mark all functions in asm/atomic_ops.h, asm/atomic.h and
asm/preempt.h as __always_inline to avoid unwanted instrumentation
- Fix removal of a Processor Activity Instrumentation (PAI) sampling
event in PMU device driver
- Align system call table on 8 bytes
* tag 's390-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/entry: align system call table on 8 bytes
s390/pai: fix sampling event removal for PMU device driver
s390/preempt: mark all functions __always_inline
s390/atomic: mark all functions __always_inline
s390/mm: fix NULL pointer dereference
Linus Torvalds [Fri, 5 Apr 2024 19:55:40 +0000 (12:55 -0700)]
Merge tag 'pm-6.9-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a recent Energy Model change that went against a recent scheduler
change made independently (Vincent Guittot)"
* tag 'pm-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: EM: fix wrong utilization estimation in em_cpu_energy()
Linus Torvalds [Fri, 5 Apr 2024 19:51:32 +0000 (12:51 -0700)]
Merge tag 'thermal-6.9-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix two power allocator thermal governor issues and an ACPI
thermal driver regression that all were introduced during the 6.8
development cycle.
Specifics:
- Allow the power allocator thermal governor to bind to a thermal
zone without cooling devices and/or without trip points (Nikita
Travkin)
- Make the ACPI thermal driver register a tripless thermal zone when
it cannot find any usable trip points instead of returning an error
from acpi_thermal_add() (Stephen Horvath)"
* tag 'thermal-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: gov_power_allocator: Allow binding without trip points
thermal: gov_power_allocator: Allow binding without cooling devices
ACPI: thermal: Register thermal zones without valid trip points
Linus Torvalds [Fri, 5 Apr 2024 19:12:19 +0000 (12:12 -0700)]
Merge tag 'gpio-fixes-for-v6.9-rc3' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- make sure GPIO devices are registered with the subsystem before
trying to return them to a caller of gpio_device_find()
- fix two issues with incorrect sanitization of the interrupt labels
* tag 'gpio-fixes-for-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: cdev: fix missed label sanitizing in debounce_setup()
gpio: cdev: check for NULL labels when sanitizing them for irqs
gpiolib: Fix triggering "kobject: 'gpiochipX' is not initialized, yet" kobject_get() errors
Linus Torvalds [Fri, 5 Apr 2024 19:09:16 +0000 (12:09 -0700)]
Merge tag 'ata-6.9-rc3' of git://git./linux/kernel/git/libata/linux
Pull ata fixes from Damien Le Moal:
- Compilation warning fixes from Arnd: one in the sata_sx4 driver due
to an incorrect calculation of the parameters passed to memcpy() and
another one in the sata_mv driver when CONFIG_PCI is not set
- Drop the owner driver field assignment in the pata_macio driver. That
is not needed as the PCI core code does that already (Krzysztof)
- Remove an unusued field in struct st_ahci_drv_data of the ahci_st
driver (Christophe)
- Add a missing clock probe error check in the sata_gemini driver
(Chen)
* tag 'ata-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: sata_gemini: Check clk_enable() result
ata: sata_mv: Fix PCI device ID table declaration compilation warning
ata: ahci_st: Remove an unused field in struct st_ahci_drv_data
ata: pata_macio: drop driver owner assignment
ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
Linus Torvalds [Fri, 5 Apr 2024 18:58:55 +0000 (11:58 -0700)]
Merge tag 'sound-6.9-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became a bit bigger collection of patches, but almost all are
about device-specific fixes, and should be safe for 6.9:
- Lots of ASoC Intel SOF-related fixes/updates
- Locking fixes in SoundWire drivers
- ASoC AMD ACP/SOF updates
- ASoC ES8326 codec fixes
- HD-audio codec fixes and quirks
- A regression fix in emu10k1 synth code"
* tag 'sound-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (49 commits)
ASoC: SOF: Core: Add remove_late() to sof_init_environment failure path
ASoC: SOF: amd: fix for false dsp interrupts
ASoC: SOF: Intel: lnl: Disable DMIC/SSP offload on remove
ASoC: Intel: avs: boards: Add modules description
ASoC: codecs: ES8326: Removing the control of ADC_SCALE
ASoC: codecs: ES8326: Solve a headphone detection issue after suspend and resume
ASoC: codecs: ES8326: modify clock table
ASoC: codecs: ES8326: Solve error interruption issue
ALSA: line6: Zero-initialize message buffers
ALSA: hda/realtek: cs35l41: Support ASUS ROG G634JYR
ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
ALSA: hda/realtek: Add sound quirks for Lenovo Legion slim 7 16ARHA7 models
Revert "ALSA: emu10k1: fix synthesizer sample playback position and caching"
OSS: dmasound/paula: Mark driver struct with __refdata to prevent section mismatch
ALSA: hda/realtek: Add quirks for ASUS Laptops using CS35L56
ASoC: amd: acp: fix for acp_init function error handling
ASoC: tas2781: mark dvc_tlv with __maybe_unused
ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
ASoC: rt-sdw*: add __func__ to all error logs
ASoC: rt722-sdca-sdw: fix locking sequence
...
Linus Torvalds [Fri, 5 Apr 2024 18:53:46 +0000 (11:53 -0700)]
Merge tag 'drm-fixes-2024-04-05' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, mostly xe and i915, amdgpu on a week off, otherwise a
nouveau fix for a crash with new vulkan cts tests, and a couple of
cleanups and misc fixes.
display:
- fix typos in kerneldoc
prime:
- unbreak dma-buf export for virt-gpu
nouveau:
- uvmm: fix remap address calculation
- minor cleanups
panfrost:
- fix power-transition timeouts
xe:
- Stop using system_unbound_wq for preempt fences
- Fix saving unordered rebinding fences by attaching them as kernel
feces to the vm's resv
- Fix TLB invalidation fences completing out of order
- Move rebind TLB invalidation to the ring ops to reduce the latency
i915:
- A few DisplayPort related fixes
- eDP PSR fixes
- Remove some VM space restrictions on older platforms
- Disable automatic load CCS load balancing"
* tag 'drm-fixes-2024-04-05' of https://gitlab.freedesktop.org/drm/kernel: (22 commits)
drm/xe: Use ordered wq for preempt fence waiting
drm/xe: Move vma rebinding to the drm_exec locking loop
drm/xe: Make TLB invalidation fences unordered
drm/xe: Rework rebinding
drm/xe: Use ring ops TLB invalidation for rebinds
drm/i915/mst: Reject FEC+MST on ICL
drm/i915/mst: Limit MST+DSC to TGL+
drm/i915/dp: Fix the computation for compressed_bpp for DISPLAY < 13
drm/i915/gt: Enable only one CCS for compute workload
drm/i915/gt: Do not generate the command streamer for all the CCS
drm/i915/gt: Disable HW load balancing for CCS
drm/i915/gt: Limit the reserved VM space to only the platforms that need it
drm/i915/psr: Fix intel_psr2_sel_fetch_et_alignment usage
drm/i915/psr: Move writing early transport pipe src
drm/i915/psr: Calculate PIPE_SRCSZ_ERLY_TPT value
drm/i915/dp: Remove support for UHBR13.5
drm/i915/dp: Fix DSC state HW readout for SST connectors
drm/display: fix typo
drm/prime: Unbreak virtgpu dma-buf export
nouveau/uvmm: fix addr/range calcs for remap operations
...
Peter Collingbourne [Tue, 2 Apr 2024 00:14:58 +0000 (17:14 -0700)]
stackdepot: rename pool_index to pool_index_plus_1
Commit
3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle")
changed the meaning of the pool_index field to mean "the pool index plus
1". This made the code accessing this field less self-documenting, as
well as causing debuggers such as drgn to not be able to easily remain
compatible with both old and new kernels, because they typically do that
by testing for presence of the new field. Because stackdepot is a
debugging tool, we should make sure that it is debugger friendly.
Therefore, give the field a different name to improve readability as well
as enabling debugger backwards compatibility.
This is needed in 6.9, which would otherwise become an odd release with
the new semantics and old name so debuggers wouldn't recognize the new
semantics there.
Fixes: 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle")
Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Wed, 3 Apr 2024 21:21:30 +0000 (23:21 +0200)]
x86/mm/pat: fix VM_PAT handling in COW mappings
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:
ffffba2c0377fab8 EFLAGS:
00010282
[ 301.561590] RAX:
00000000ffffffea RBX:
ffff9208c8ce9cc0 RCX:
000000010455e047
[ 301.562105] RDX:
07fffffff0eb1e0a RSI:
0000000000000000 RDI:
ffff9208c391d200
[ 301.562628] RBP:
0000000000000000 R08:
ffffba2c0377fab8 R09:
0000000000000000
[ 301.563145] R10:
ffff9208d2292d50 R11:
0000000000000002 R12:
00007fea890e0000
[ 301.563669] R13:
0000000000000000 R14:
ffffba2c0377fc08 R15:
0000000000000000
[ 301.564186] FS:
0000000000000000(0000) GS:
ffff920c2fbc0000(0000) knlGS:
0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 301.565197] CR2:
00007fea88ee8a20 CR3:
00000001033a8000 CR4:
0000000000750ef0
[ 301.565725] PKRU:
55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Wupeng Ma <mawupeng1@huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexey Makhalov [Tue, 2 Apr 2024 23:23:34 +0000 (16:23 -0700)]
MAINTAINERS: change vmware.com addresses to broadcom.com
Update all remaining vmware.com email addresses to actual broadcom.com.
Add corresponding .mailmap entries for maintainers who contributed in the
past as the vmware.com address will start bouncing soon.
Maintainership update. Jeff Sipek has left VMware, Nick Shi will be
maintaining VMware PTP.
Link: https://lkml.kernel.org/r/20240402232334.33167-1-alexey.makhalov@broadcom.com
Signed-off-by: Alexey Makhalov <alexey.makhalov@broadcom.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Ajay Kaher <ajay.kaher@broadcom.com>
Acked-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Nick Shi <nick.shi@broadcom.com>
Acked-by: Bryan Tan <bryan-bt.tan@broadcom.com>
Acked-by: Vishnu Dasa <vishnu.dasa@broadcom.com>
Acked-by: Vishal Bhakta <vishal.bhakta@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Edward Liaw [Fri, 29 Mar 2024 18:58:10 +0000 (18:58 +0000)]
selftests/mm: include strings.h for ffsl
Got a compilation error on Android for ffsl after
91b80cc5b39f
("selftests: mm: fix map_hugetlb failure on 64K page size systems")
included vm_util.h.
Link: https://lkml.kernel.org/r/20240329185814.16304-1-edliaw@google.com
Fixes: af605d26a8f2 ("selftests/mm: merge util.h into vm_util.h")
Signed-off-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Thu, 28 Mar 2024 14:03:30 +0000 (15:03 +0100)]
mm: vmalloc: fix lockdep warning
A lockdep reports a possible deadlock in the find_vmap_area_exceed_addr_lock()
function:
============================================
WARNING: possible recursive locking detected
6.9.0-rc1-00060-ged3ccc57b108-dirty #6140 Not tainted
--------------------------------------------
drgn/455 is trying to acquire lock:
ffff0000c00131d0 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124
but task is already holding lock:
ffff0000c0011878 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&vn->busy.lock/1);
lock(&vn->busy.lock/1);
*** DEADLOCK ***
indeed it can happen if the find_vmap_area_exceed_addr_lock() gets called
concurrently because it tries to acquire two nodes locks. It was done to
prevent removing a lowest VA found on a previous step.
To address this a lowest VA is found first without holding a node lock
where it resides. As a last step we check if a VA still there because it
can go away, if removed, proceed with next lowest.
[akpm@linux-foundation.org: fix comment typos, per Baoquan]
Link: https://lkml.kernel.org/r/20240328140330.4747-1-urezki@gmail.com
Fixes: 53becf32aec1 ("mm: vmalloc: support multiple nodes in vread_iter")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Omar Sandoval <osandov@fb.com>
Reported-by: Jens Axboe <axboe@kernel.dk>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Sat, 23 Mar 2024 14:15:44 +0000 (15:15 +0100)]
mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
During the boot the s390 system triggers "spinlock bad magic" messages
if the spinlock debugging is enabled:
[ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
[ 0.465490] lock: single+0x1860/0x1958, .magic:
00000000, .owner: <none>/-1, .owner_cpu: 0
[ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted
6.8.0-12955-g8e938e398669 #1
[ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[ 0.466270] Call Trace:
[ 0.466470] [<
00000000011f26c8>] dump_stack_lvl+0x98/0xd8
[ 0.466516] [<
00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
[ 0.466545] [<
000000000042146c>] find_vmap_area+0x6c/0x108
[ 0.466572] [<
000000000042175a>] find_vm_area+0x22/0x40
[ 0.466597] [<
000000000012f152>] __set_memory+0x132/0x150
[ 0.466624] [<
0000000001cc0398>] vmem_map_init+0x40/0x118
[ 0.466651] [<
0000000001cc0092>] paging_init+0x22/0x68
[ 0.466677] [<
0000000001cbbed2>] setup_arch+0x52a/0x708
[ 0.466702] [<
0000000001cb6140>] start_kernel+0x80/0x5c8
[ 0.466727] [<
0000000000100036>] startup_continue+0x36/0x40
it happens because such system tries to access some vmap areas
whereas the vmalloc initialization is not even yet done:
[ 0.465490] lock: single+0x1860/0x1958, .magic:
00000000, .owner: <none>/-1, .owner_cpu: 0
[ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted
6.8.0-12955-g8e938e398669 #1
[ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[ 0.466270] Call Trace:
[ 0.466470] dump_stack_lvl (lib/dump_stack.c:117)
[ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
[ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
[ 0.466572] find_vm_area (mm/vmalloc.c:3150)
[ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
[ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
[ 0.466651] paging_init (arch/s390/mm/init.c:97)
[ 0.466677] setup_arch (arch/s390/kernel/setup.c:972)
[ 0.466702] start_kernel (init/main.c:899)
[ 0.466727] startup_continue (arch/s390/kernel/head64.S:35)
[ 0.466811] INFO: lockdep is turned off.
...
[ 0.718250] vmalloc init - busy lock init
0000000002871860
[ 0.718328] vmalloc init - busy lock init
00000000028731b8
Some background. It worked before because the lock that is in question
was statically defined and initialized. As of now, the locks and data
structures are initialized in the vmalloc_init() function.
To address that issue add the check whether the "vmap_initialized"
variable is set, if not find_vmap_area() bails out on entry returning NULL.
Link: https://lkml.kernel.org/r/20240323141544.4150-1-urezki@gmail.com
Fixes: 72210662c5a2 ("mm: vmalloc: offload free_vmap_area_lock lock")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
John Sperbeck [Sat, 23 Mar 2024 15:29:34 +0000 (08:29 -0700)]
init: open output files from cpio unpacking with O_LARGEFILE
If a member of a cpio archive for an initrd or initrams is larger than
2Gb, we'll eventually fail to write to that file when we get to that
limit, unless O_LARGEFILE is set.
The problem can be seen with this recipe, assuming that BLK_DEV_RAM
is not configured:
cd /tmp
dd if=/dev/zero of=BIGFILE bs=
1048576 count=2200
echo BIGFILE | cpio -o -H newc -R root:root > initrd.img
kexec -l /boot/vmlinuz-$(uname -r) --initrd=initrd.img --reuse-cmdline
kexec -e
The console will show 'Initramfs unpacking failed: write error'. With
the patch, the error is gone.
Link: https://lkml.kernel.org/r/20240323152934.3307391-1-jsperbeck@google.com
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Tue, 26 Mar 2024 14:32:08 +0000 (15:32 +0100)]
mm/secretmem: fix GUP-fast succeeding on secretmem folios
folio_is_secretmem() currently relies on secretmem folios being LRU
folios, to save some cycles.
However, folios might reside in a folio batch without the LRU flag set, or
temporarily have their LRU flag cleared. Consequently, the LRU flag is
unreliable for this purpose.
In particular, this is the case when secretmem_fault() allocates a fresh
page and calls filemap_add_folio()->folio_add_lru(). The folio might be
added to the per-cpu folio batch and won't get the LRU flag set until the
batch was drained using e.g., lru_add_drain().
Consequently, folio_is_secretmem() might not detect secretmem folios and
GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
when we would later try reading/writing to the folio, because the folio
has been unmapped from the directmap.
Fix it by removing that unreliable check.
Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/
Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rafael J. Wysocki [Fri, 5 Apr 2024 18:17:48 +0000 (20:17 +0200)]
Merge branch 'acpi-thermal'
* acpi-thermal:
ACPI: thermal: Register thermal zones without valid trip points
Jeff Layton [Fri, 5 Apr 2024 17:56:18 +0000 (13:56 -0400)]
nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
client. While a callback job is technically an RPC that counter is
really more for client-driven RPCs, and this has the effect of
preventing the client from being unhashed until the callback completes.
If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
can end up in a situation where the callback can't complete on the (now
dead) callback channel, but the new client can't connect because the old
client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
return on the CREATE_SESSION operation.
The job is only holding a reference to the client so it can clear a flag
after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a
reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of
reference when dealing with the nfsdfs info files, but it should work
appropriately here to ensure that the nfs4_client doesn't disappear.
Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
Reported-by: Vladimir Benes <vbenes@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Linus Torvalds [Fri, 5 Apr 2024 17:05:42 +0000 (10:05 -0700)]
Merge tag '9p-for-6.9-rc3' of https://github.com/martinetd/linux
Pull minor 9p cleanups from Dominique Martinet:
- kernel doc fix & removal of unused flag
- fix some bogus debug statement for read/write
* tag '9p-for-6.9-rc3' of https://github.com/martinetd/linux:
9p: remove SLAB_MEM_SPREAD flag usage
9p: Fix read/write debug statements to report server reply
9p/trans_fd: remove Excess kernel-doc comment
Linus Torvalds [Fri, 5 Apr 2024 17:02:09 +0000 (10:02 -0700)]
Merge tag '6.9-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Three fixes, all also for stable:
- encryption fix
- memory overrun fix
- oplock break fix"
* tag '6.9-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
ksmbd: validate payload size in ipc response
ksmbd: don't send oplock break if rename fails
Linus Torvalds [Fri, 5 Apr 2024 16:47:26 +0000 (09:47 -0700)]
Merge tag 'vfs-6.9-rc3.fixes' of git://git./linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a few small fixes. This comes with some delay because I
wanted to wait on people running their reproducers and the Easter
Holidays meant that those replies came in a little later than usual:
- Fix handling of preventing writes to mounted block devices.
Since last kernel we allow to prevent writing to mounted block
devices provided CONFIG_BLK_DEV_WRITE_MOUNTED isn't set and the
block device is opened with restricted writes. When we switched to
opening block devices as files we altered the mechanism by which we
recognize when a block device has been opened with write
restrictions.
The detection logic assumed that only read-write mounted
filesystems would apply write restrictions to their block devices
from other openers. That of course is not true since it also makes
sense to apply write restrictions for filesystems that are
read-only.
Fix the detection logic using an FMODE_* bit. We still have a few
left since we freed up a couple a while ago. I also picked up a
patch to free up four additional FMODE_* bits scheduled for the
next merge window.
- Fix counting the number of writers to a block device. This just
changes the logic to be consistent.
- Fix a bug in aio causing a NULL pointer derefernce after we
implemented batched processing in aio.
- Finally, add the changes we discussed that allows to yield block
devices early even though file closing itself is deferred.
This also allows us to remove two holder operations to get and
release the holder to align lifetime of file and holder of the
block device"
* tag 'vfs-6.9-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
aio: Fix null ptr deref in aio_complete() wakeup
fs,block: yield devices early
block: count BLK_OPEN_RESTRICT_WRITES openers
block: handle BLK_OPEN_RESTRICT_WRITES correctly
Kent Overstreet [Sun, 31 Mar 2024 21:52:12 +0000 (17:52 -0400)]
aio: Fix null ptr deref in aio_complete() wakeup
list_del_init_careful() needs to be the last access to the wait queue
entry - it effectively unlocks access.
Previously, finish_wait() would see the empty list head and skip taking
the lock, and then we'd return - but the completion path would still
attempt to do the wakeup after the task_struct pointer had been
overwritten.
Fixes: 71eb6b6b0ba9 ("fs/aio: obey min_nr when doing wakeups")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-fsdevel/CAHTA-ubfwwB51A5Wg5M6H_rPEQK9pNf8FkAGH=vr=FEkyRrtqw@mail.gmail.com/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/stable/20240331215212.522544-1-kent.overstreet%40linux.dev
Link: https://lore.kernel.org/r/20240331215212.522544-1-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
Anna-Maria Behnsen [Fri, 5 Apr 2024 08:53:21 +0000 (10:53 +0200)]
timers/migration: Return early on deactivation
Commit
4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
deactivation") removed the logic to return early in tmigr_update_events()
on deactivation. With this the problem with a not properly updated first
global event in a hierarchy containing only a single group was fixed.
But when having a look at this code path with a hierarchy with more than a
single level, now unnecessary work is done (example is partially copied
from the message of the commit mentioned above):
[GRP1:0]
migrator = GRP0:0
active = GRP0:0
nextevt = T0:0i, T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = 0 migrator = NONE
active = 0 active = NONE
nextevt = T0i, T1 nextevt = T2
/ \ / \
0 (T0i) 1 (T1) 2 (T2) 3
active idle idle idle
0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2
[GRP1:0]
migrator = GRP0:0
active = GRP0:0
nextevt = T0:0i, T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = NONE migrator = NONE
active = NONE active = NONE
nextevt = T1 nextevt = T2
/ \ / \
0 (T0i) 1 (T1) 2 (T2) 3
idle idle idle idle
1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
pushed as its next expiry and its own event kept as "ignore". Without this
early return the following steps happen in tmigr_update_events() when
child = null and group = GRP0:0 :
lock(GRP0:0->lock);
timerqueue_del(GRP0:0, T0i);
unlock(GRP0:0->lock);
[GRP1:0]
migrator = NONE
active = NONE
nextevt = T0:0, T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = NONE migrator = NONE
active = NONE active = NONE
nextevt = T1 nextevt = T2
/ \ / \
0 (T0i) 1 (T1) 2 (T2) 3
idle idle idle idle
2) The change now propagates up to the top. Then tmigr_update_events()
updates the group event of GRP0:0 and executes the following steps
(child = GRP0:0 and group = GRP0:0):
lock(GRP0:0->lock);
lock(GRP1:0->lock);
evt = tmigr_next_groupevt(GRP0:0); -> this removes the ignored events
in GRP0:0
... update GRP1:0 group event and timerqueue ...
unlock(GRP1:0->lock);
unlock(GRP0:0->lock);
So the dance in 1) with locking the GRP0:0->lock and removing the T0i from
the timerqueue is redundand as this is done nevertheless in 2) when
tmigr_next_groupevt(GRP0:0) is executed.
Revert commit
4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
deactivation") and add a condition into return path to skip the return
only, when hierarchy contains a single group. Adapt comments accordingly.
Fixes: 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on deactivation")
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87cyr49on2.fsf@somnus
Frederic Weisbecker [Mon, 1 Apr 2024 21:48:59 +0000 (23:48 +0200)]
timers/migration: Fix ignored event due to missing CPU update
When a group event is updated with its expiry unchanged but a different
CPU, that target change may go unnoticed and the event may be propagated
up with a stale CPU value. The following depicts a scenario that has
been actually observed:
[GRP2:0]
migrator = GRP1:1
active = GRP1:1
nextevt = TGRP1:0 (T0)
/ \
[GRP1:0] [GRP1:1]
migrator = NONE [...]
active = NONE
nextevt = TGRP0:0 (T0)
/ \
[GRP0:0] [...]
migrator = NONE
active = NONE
nextevt = T0
/ \
0 (T0) 1 (T1)
idle idle
0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
the same expiry value.
[GRP2:0]
migrator = GRP1:1
active = GRP1:1
nextevt = KTIME_MAX
/ \
[GRP1:0] [GRP1:1]
migrator = NONE [...]
active = NONE
nextevt = TGRP0:0 (T0)
/ \
[GRP0:0] [...]
migrator = NONE
active = NONE
nextevt = T0
/ \
0 (T0) 1 (T1)
idle idle
1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
from the top and T0 executed.
[GRP2:0]
migrator = GRP1:1
active = GRP1:1
nextevt = KTIME_MAX
/ \
[GRP1:0] [GRP1:1]
migrator = NONE [...]
active = NONE
nextevt = TGRP0:0 (T0)
/ \
[GRP0:0] [...]
migrator = NONE
active = NONE
nextevt = T1
/ \
0 1 (T1)
idle idle
2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
none. But it updates the events from its groups, starting with GRP0:0
which now has T1 as its next event. So far so good.
[GRP2:0]
migrator = GRP1:1
active = GRP1:1
nextevt = KTIME_MAX
/ \
[GRP1:0] [GRP1:1]
migrator = NONE [...]
active = NONE
nextevt = TGRP0:0 (T0)
/ \
[GRP0:0] [...]
migrator = NONE
active = NONE
nextevt = T1
/ \
0 1 (T1)
idle idle
3) The migrator in GRP1:1 proceeds upward and updates the events in
GRP1:0. The child event TGRP0:0 is found queued with the same expiry
as before. And therefore it is left unchanged. However the target CPU
is not the same but that fact is ignored so TGRP0:0 still points to
CPU 0 when it should point to CPU 1.
[GRP2:0]
migrator = GRP1:1
active = GRP1:1
nextevt = TGRP1:0 (T0)
/ \
[GRP1:0] [GRP1:1]
migrator = NONE [...]
active = NONE
nextevt = TGRP0:0 (T0)
/ \
[GRP0:0] [...]
migrator = NONE
active = NONE
nextevt = T1
/ \
0 1 (T1)
idle idle
4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
the top level group.
[GRP2:0]
migrator = GRP1:1
active = GRP1:1
nextevt = KTIME_MAX
/ \
[GRP1:0] [GRP1:1]
migrator = NONE [...]
active = NONE
nextevt = TGRP0:0 (T0)
/ \
[GRP0:0] [...]
migrator = NONE
active = NONE
nextevt = T1
/ \
0 1 (T1)
idle idle
5) The migrator in GRP1:1 dequeues the next event in top level pointing
to CPU 0. But since it actually doesn't see any real event in CPU 0, it
early returns.
6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.
Some other bad scenario may involve trees with just two levels.
Fix this with unconditionally updating the CPU of the child event before
considering to early return while updating a queued event with an
unchanged expiry value.
Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Link: https://lore.kernel.org/r/Zg2Ct6M2RJAYHgCB@localhost.localdomain
Takashi Iwai [Fri, 5 Apr 2024 06:48:12 +0000 (08:48 +0200)]
Merge tag 'asoc-fix-v6.9-rc2' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.9
A relatively large set of fixes here, the biggest piece of it is a
series correcting some problems with the delay reporting for Intel SOF
cards but there's a bunch of other things. Everything here is driver
specific except for a fix in the core for an issue with sign extension
handling volume controls.
Dave Airlie [Fri, 5 Apr 2024 02:32:08 +0000 (12:32 +1000)]
Merge tag 'drm-intel-fixes-2024-04-04' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-fixes
Display fixes:
- A few DisplayPort related fixes (Imre, Arun, Ankit, Ville)
- eDP PSR fixes (Jouni)
Core/GT fixes:
- Remove some VM space restrictions on older platforms (Andi)
- Disable automatic load CCS load balancing (Andi)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zg7nSK5oTmWfKPPI@intel.com
Dave Airlie [Fri, 5 Apr 2024 02:25:28 +0000 (12:25 +1000)]
Merge tag 'drm-xe-fixes-2024-04-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Stop using system_unbound_wq for preempt fences,
as this can cause starvation when reaching more
than max_active defined by workqueue
- Fix saving unordered rebinding fences by attaching
them as kernel feces to the vm's resv
- Fix TLB invalidation fences completing out of order
- Move rebind TLB invalidation to the ring ops to reduce
the latency
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/tizan6wdpxu4ayudeikjglxdgzmnhdzj3li3z2pgkierjtozzw@lbfddeg43a7h
Dave Airlie [Fri, 5 Apr 2024 01:59:02 +0000 (11:59 +1000)]
Merge tag 'drm-misc-fixes-2024-04-04' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
display:
- fix typos in kerneldoc
nouveau:
- uvmm: fix remap address calculation
- minor cleanups
panfrost:
- fix power-transition timeouts
prime:
- unbreak dma-buf export for virt-gpu
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404104813.GA27376@localhost.localdomain
Sean Christopherson [Fri, 5 Apr 2024 00:16:14 +0000 (17:16 -0700)]
x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word
Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate
compile-time assert in KVM to prevent direct lookups on the features in
CPUID_LNX_5. KVM uses X86_FEATURE_* flags to manage guest CPUID, and so
must translate features that are scattered by Linux from the Linux-defined
bit to the hardware-defined bit, i.e. should never try to directly access
scattered features in guest CPUID.
Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a
compile-time assert in KVM's CPUID infrastructure to ensure that future
additions update cpuid_leafs along with NCAPINTS.
No functional change intended.
Fixes: 7f274e609f3d ("x86/cpufeatures: Add new word for scattered features")
Cc: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 4 Apr 2024 21:49:10 +0000 (14:49 -0700)]
Merge tag 'net-6.9-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bluetooth and bpf.
Fairly usual collection of driver and core fixes. The large selftest
accompanying one of the fixes is also becoming a common occurrence.
Current release - regressions:
- ipv6: fix infinite recursion in fib6_dump_done()
- net/rds: fix possible null-deref in newly added error path
Current release - new code bugs:
- net: do not consume a full cacheline for system_page_pool
- bpf: fix bpf_arena-related file descriptor leaks in the verifier
- drv: ice: fix freeing uninitialized pointers, fixing misuse of the
newfangled __free() auto-cleanup
Previous releases - regressions:
- x86/bpf: fixes the BPF JIT with retbleed=stuff
- xen-netfront: add missing skb_mark_for_recycle, fix page pool
accounting leaks, revealed by recently added explicit warning
- tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
non-wildcard addresses
- Bluetooth:
- replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
better workarounds to un-break some buggy Qualcomm devices
- set conn encrypted before conn establishes, fix re-connecting to
some headsets which use slightly unusual sequence of msgs
- mptcp:
- prevent BPF accessing lowat from a subflow socket
- don't account accept() of non-MPC client as fallback to TCP
- drv: mana: fix Rx DMA datasize and skb_over_panic
- drv: i40e: fix VF MAC filter removal
Previous releases - always broken:
- gro: various fixes related to UDP tunnels - netns crossing
problems, incorrect checksum conversions, and incorrect packet
transformations which may lead to panics
- bpf: support deferring bpf_link dealloc to after RCU grace period
- nf_tables:
- release batch on table validation from abort path
- release mutex after nft_gc_seq_end from abort path
- flush pending destroy work before exit_net release
- drv: r8169: skip DASH fw status checks when DASH is disabled"
* tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
netfilter: validate user input for expected length
net/sched: act_skbmod: prevent kernel-infoleak
net: usb: ax88179_178a: avoid the interface always configured as random address
net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
net: ravb: Always update error counters
net: ravb: Always process TX descriptor ring
netfilter: nf_tables: discard table flag update with pending basechain deletion
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: release batch on table validation from abort path
Revert "tg3: Remove residual error handling in tg3_suspend"
tg3: Remove residual error handling in tg3_suspend
net: mana: Fix Rx DMA datasize and skb_over_panic
net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
net: stmmac: fix rx queue priority assignment
net: txgbe: fix i2c dev name cannot match clkdev
net: fec: Set mac_managed_pm during probe
...