Rafael J. Wysocki [Tue, 2 Nov 2021 18:31:28 +0000 (19:31 +0100)]
Merge branches 'pm-em' and 'powercap'
Merge Energy Model and power capping updates for 5.16-rc1:
- Add support for inefficient operating performance points to the
Energy Model and modify cpufreq to use them properly (Vincent
Donnefort).
- Rearrange the DTPM framework code to simplify it and make it easier
to follow (Daniel Lezcano).
- Fix power intialization in DTPM (Daniel Lezcano).
- Add CPU load consideration when estimating the instaneous power
consumption in DTPM (Daniel Lezcano).
* pm-em:
cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
PM: EM: Mark inefficiencies in CPUFreq
cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
cpufreq: Introducing CPUFREQ_RELATION_E
cpufreq: Add an interface to mark inefficient frequencies
cpufreq: Make policy min/max hard requirements
PM: EM: Allow skipping inefficient states
PM: EM: Extend em_perf_domain with a flag field
PM: EM: Mark inefficient states
PM: EM: Fix inefficient states detection
* powercap:
powercap/drivers/dtpm: Fix power limit initialization
powercap/drivers/dtpm: Scale the power with the load
powercap/drivers/dtpm: Use container_of instead of a private data field
powercap/drivers/dtpm: Simplify the dtpm table
powercap/drivers/dtpm: Encapsulate even more the code
Rafael J. Wysocki [Tue, 2 Nov 2021 18:23:45 +0000 (19:23 +0100)]
Merge branches 'pm-cpufreq' and 'pm-cpuidle'
Merge cpufreq and cpuidle updates for 5.16-rc1:
- Fix cpu->pstate.turbo_freq initialization in intel_pstate (Zhang
Rui).
- Make intel_pstate process HWP Guaranteed change notifications from
the processor (Srinivas Pandruvada).
- Fix typo in cpufreq.h (Rafael Wysocki).
- Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).
- Fix the parameter usage of the newly added perf-domain API (Hector
Yuan).
- Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
Guenter Roeck, and Arnd Bergmann).
- Fix kobject memory leaks in cpuidle error paths (Anel Orazgaliyeva).
- Make intel_idle enable interrupts before entering C1 on some Xeon
processor models (Artem Bityutskiy).
* pm-cpufreq:
cpufreq: Fix parameter in parse_perf_domain()
cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization
cpufreq: Fix typo in cpufreq.h
cpufreq: intel_pstate: Process HWP Guaranteed change notification
cpufreq: tegra186/tegra194: Handle errors in BPMP response
cpufreq: remove useless INIT_LIST_HEAD()
cpufreq: s3c244x: add fallthrough comments for switch
cpufreq: vexpress: Drop unused variable
* pm-cpuidle:
cpuidle: Fix kobject memory leaks in error paths
intel_idle: enable interrupts before C1 on Xeons
Rafael J. Wysocki [Tue, 2 Nov 2021 18:11:49 +0000 (19:11 +0100)]
Merge branch 'pm-sleep'
Merge updates related to system sleep for 5.16-rc1:
- Clean up hib_wait_io() (Falla Coulibaly).
- Fix sparse warnings in hibernation-related code (Anders Roxell).
- Use vzalloc() and kzalloc() instead of their open-coded
equivalents in hibernation-related code (Cai Huoqing).
- Prevent user space from crashing the kernel by attempting to
restore the system state from a swap partition in use (Ye Bin).
- Do not let "syscore" devices runtime-suspend during system PM
transitions (Rafael Wysocki).
- Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki).
- Pause cpuidle later and resume it earlier during system PM
transitions (Rafael Wysocki).
- Make system suspend code use valid_state() consistently (Rafael
Wysocki).
- Add support for enabling wakeup IRQs after invoking the
->runtime_suspend() callback and make two drivers use it (Chunfeng
Yun).
* pm-sleep:
usb: mtu3: enable wake-up interrupt after runtime_suspend called
usb: xhci-mtk: enable wake-up interrupt after runtime_suspend called
PM / wakeirq: support enabling wake-up irq after runtime_suspend called
PM: suspend: Use valid_state() consistently
PM: sleep: Pause cpuidle later and resume it earlier during system transitions
PM: suspend: Do not pause cpuidle in the suspend-to-idle path
PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
PM: hibernate: Get block device exclusively in swsusp_check()
PM: hibernate: swap: Use vzalloc() and kzalloc()
PM: hibernate: fix sparse warnings
Revert "PM: sleep: Do not assume that "mem" is always present"
PM: hibernate: Remove blk_status_to_errno in hib_wait_io
PM: sleep: Do not assume that "mem" is always present
Rafael J. Wysocki [Tue, 2 Nov 2021 18:06:30 +0000 (19:06 +0100)]
Merge branch 'pm-pci'
Merge PCI device power management updates for 5.16-rc1:
- Make the association of ACPI device objects with PCI devices more
straightforward and simplify the code doing that for all devices
in general (Rafael Wysocki).
- Eliminate struct pci_platform_pm_ops and handle the both of its
users (PCI and Intel MID) directly in the PCI bus code (Rafael
Wysocki).
- Simplify and clarify ACPI PCI device PM helpers (Rafael Wysocki).
- Fix ordering of operations in pci_back_from_sleep() (Rafael
Wysocki).
* pm-pci:
PCI: PM: Fix ordering of operations in pci_back_from_sleep()
PCI: PM: Do not call platform_pci_power_manageable() unnecessarily
PCI: PM: Make pci_choose_state() call pci_target_state()
PCI: PM: Rearrange pci_target_state()
PCI: PM: Simplify acpi_pci_power_manageable()
PCI: PM: Drop struct pci_platform_pm_ops
PCI: ACPI: PM: Do not use pci_platform_pm_ops for ACPI
PCI: PM: Do not use pci_platform_pm_ops for Intel MID PM
ACPI: glue: Look for ACPI bus type only if ACPI companion is not known
ACPI: glue: Drop cleanup callback from struct acpi_bus_type
PCI: ACPI: Drop acpi_pci_bus
Rafael J. Wysocki [Tue, 2 Nov 2021 16:55:31 +0000 (17:55 +0100)]
Merge branch 'cpufreq/arm/linux-next' of git://git./linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 5.16-rc1 from Viresh Kumar:
"- Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).
- Fix the parameter usage of the newly added perf-domain API (Hector
Yuan).
- Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
Guenter Roeck, and Arnd Bergmann)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: Fix parameter in parse_perf_domain()
cpufreq: tegra186/tegra194: Handle errors in BPMP response
cpufreq: remove useless INIT_LIST_HEAD()
cpufreq: s3c244x: add fallthrough comments for switch
cpufreq: vexpress: Drop unused variable
Hector.Yuan [Mon, 4 Oct 2021 14:42:33 +0000 (22:42 +0800)]
cpufreq: Fix parameter in parse_perf_domain()
Pass cpu to parse_perf_domain() instead of pcpu.
Fixes: 8486a32dd484 ("cpufreq: Add of_perf_domain_get_sharing_cpumask")
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: Massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Chunfeng Yun [Mon, 25 Oct 2021 07:01:55 +0000 (15:01 +0800)]
usb: mtu3: enable wake-up interrupt after runtime_suspend called
Use the new API dev_pm_set_dedicated_wake_irq_reverse() to request
dedicated wake-up interrupt, due to we want to enable the wake IRQ
after running ->runtime_suspend().
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Chunfeng Yun [Mon, 25 Oct 2021 07:01:54 +0000 (15:01 +0800)]
usb: xhci-mtk: enable wake-up interrupt after runtime_suspend called
Use new function dev_pm_set_dedicated_wake_irq_reverse() to request
dedicated wake-up interrupt, due to we want to enable the wake IRQ
after running ->runtime_suspend().
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Chunfeng Yun [Mon, 25 Oct 2021 07:01:53 +0000 (15:01 +0800)]
PM / wakeirq: support enabling wake-up irq after runtime_suspend called
When the dedicated wake IRQ is level trigger, and it uses the
device's low-power status as the wakeup source, that means if the
device is not in low-power state, the wake IRQ will be triggered
if enabled; For this case, need enable the wake IRQ after running
the device's ->runtime_suspend() which make it enter low-power state.
e.g.
Assume the wake IRQ is a low level trigger type, and the wakeup
signal comes from the low-power status of the device.
The wakeup signal is low level at running time (0), and becomes
high level when the device enters low-power state (runtime_suspend
(1) is called), a wakeup event at (2) make the device exit low-power
state, then the wakeup signal also becomes low level.
------------------
| ^ ^|
---------------- | | --------------
|<---(0)--->|<--(1)--| (3) (2) (4)
if enable the wake IRQ before running runtime_suspend during (0),
a wake IRQ will arise, it causes resume immediately;
it works if enable wake IRQ ( e.g. at (3) or (4)) after running
->runtime_suspend().
This patch introduces a new status WAKE_IRQ_DEDICATED_REVERSE to
optionally support enabling wake IRQ after running ->runtime_suspend().
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 27 Oct 2021 18:40:17 +0000 (20:40 +0200)]
Merge tag 'devfreq-next-for-5.16' of git://git./linux/kernel/git/chanwoo/linux
Pull devfreq updates for v5.16 from Chanwoo Choi:
"1. Minor update for exynos-ppmu devfreq-event driver
- Devicetree naming convention requires the device node names
to use hyphens instead of underlines. In order to support
this requirement, changes the code with hyphens.
- Simplify parsing event-type from devicetree without behavior
changes.
2. Strengthen check for freq_table in devfreq core
- Check whether both freq_table is not NULL and size of freq_table
is not zero in order to prevent the error by mistake of devfreq
driver developer.
* tag 'devfreq-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: Strengthen check for freq_table
devfreq: exynos-ppmu: simplify parsing event-type from DT
devfreq: exynos-ppmu: use node names with hyphens
Samuel Holland [Wed, 29 Sep 2021 04:42:45 +0000 (23:42 -0500)]
PM / devfreq: Strengthen check for freq_table
Since commit
ea572f816032 ("PM / devfreq: Change return type of
devfreq_set_freq_table()"), all devfreq devices are expected to have a
valid freq_table. The devfreq core unconditionally dereferences
freq_table in the sysfs code and in get_freq_range().
Therefore, we need to ensure that freq_table is both non-null and
non-empty (length is > 0). If either check fails, replace the table
using set_freq_table() or return the error.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Krzysztof Kozlowski [Mon, 20 Sep 2021 07:17:52 +0000 (09:17 +0200)]
devfreq: exynos-ppmu: simplify parsing event-type from DT
When parsing devicetree, the function of_get_devfreq_events(), for each
device child node, iterates over array of possible events "ppmu_events"
till it finds one matching by node name. When match is found the
ppmu_events[i] points to element having both the name of the event and
the counters ID.
Each PPMU device child node might have an "event-name" property with the
name of the event, however due to the design of devfreq it must be the
same as the device node name. If it is not the same, the devfreq client
won't be able to use it via devfreq_event_get_edev_by_phandle().
Since PPMU device child node name must be equal to the "event-name"
property (event-name == ppmu_events[i].name), there is no need to find
the counters ID by the "event-name". Instead use ppmu_events[i].id
which must be equal to it.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Krzysztof Kozlowski [Mon, 20 Sep 2021 07:17:51 +0000 (09:17 +0200)]
devfreq: exynos-ppmu: use node names with hyphens
Devicetree naming convention requires device node names to use hyphens
instead of underscore, so Exynos5422 devfreq event name
"ppmu-event3-dmc0_0" should be "ppmu-event3-dmc0-0". Newly introduced
dtschema enforces this, however the driver still expects old name with
an underscore.
Add new events for Exynos5422 while still accepting old name for
backwards compatibility.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Zhang Rui [Tue, 26 Oct 2021 08:32:42 +0000 (16:32 +0800)]
cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization
Fix a problem in active mode that cpu->pstate.turbo_freq is initialized
only if HWP-to-frequency scaling factor is refined.
In passive mode, this problem is not exposed, because
cpu->pstate.turbo_freq is set again, later in
intel_cpufreq_cpu_init()->intel_pstate_get_hwp_cap().
Fixes: eb3693f0521e ("cpufreq: intel_pstate: hybrid: CPU-specific scaling factor")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Fri, 22 Oct 2021 15:53:43 +0000 (17:53 +0200)]
PM: suspend: Use valid_state() consistently
Make valid_state() check if the ->enter callback is present in
suspend_ops (only PM_SUSPEND_TO_IDLE can be valid otherwise) and
make sleep_state_supported() call valid_state() consistently to
validate the states other than PM_SUSPEND_TO_IDLE.
While at it, clean up the comment in valid_state().
No expected functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Fri, 22 Oct 2021 16:07:47 +0000 (18:07 +0200)]
PM: sleep: Pause cpuidle later and resume it earlier during system transitions
Commit
8651f97bd951 ("PM / cpuidle: System resume hang fix with
cpuidle") that introduced cpuidle pausing during system suspend
did that to work around a platform firmware issue causing systems
to hang during resume if CPUs were allowed to enter idle states
in the system suspend and resume code paths.
However, pausing cpuidle before the last phase of suspending
devices is the source of an otherwise arbitrary difference between
the suspend-to-idle path and other system suspend variants, so it is
cleaner to do that later, before taking secondary CPUs offline (it
is still safer to take secondary CPUs offline with cpuidle paused,
though).
Modify the code accordingly, but in order to avoid code duplication,
introduce new wrapper functions, pm_sleep_disable_secondary_cpus()
and pm_sleep_enable_secondary_cpus(), to combine cpuidle_pause()
and cpuidle_resume(), respectively, with the handling of secondary
CPUs during system-wide transitions to sleep states.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Rafael J. Wysocki [Fri, 22 Oct 2021 16:04:02 +0000 (18:04 +0200)]
PM: suspend: Do not pause cpuidle in the suspend-to-idle path
It is pointless to pause cpuidle in the suspend-to-idle path,
because it is going to be resumed in the same path later and
pausing it does not serve any particular purpose in that case.
Rework the code to avoid doing that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Linus Torvalds [Mon, 25 Oct 2021 18:30:31 +0000 (11:30 -0700)]
Linux 5.15-rc7
Matthew Wilcox (Oracle) [Mon, 25 Oct 2021 18:16:34 +0000 (19:16 +0100)]
secretmem: Prevent secretmem_users from wrapping to zero
Commit
110860541f44 ("mm/secretmem: use refcount_t instead of atomic_t")
attempted to fix the problem of secretmem_users wrapping to zero and
allowing suspend once again.
But it was reverted in commit
87066fdd2e30 ("Revert 'mm/secretmem: use
refcount_t instead of atomic_t'") because of the problems it caused - a
refcount_t was not semantically the right type to use.
Instead prevent secretmem_users from wrapping to zero by forbidding new
users if the number of users has wrapped from positive to negative.
This stops a long way short of reaching the necessary 4 billion users
where it wraps to zero again, so there's no need to be clever with
special anti-wrap types or checking the return value from atomic_inc().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jordy Zomer <jordy@pwning.systems>
Cc: Kees Cook <keescook@chromium.org>,
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 25 Oct 2021 17:46:41 +0000 (10:46 -0700)]
spi: Fix tegra20 build with CONFIG_PM=n once again
Commit
efafec27c565 ("spi: Fix tegra20 build with CONFIG_PM=n") already
fixed the build without PM support once. There was an alternative fix
by Guenter in commit
2bab94090b01 ("spi: tegra20-slink: Declare runtime
suspend and resume functions conditionally"), and Mark then merged the
two correctly in
ffb1e76f4f32 ("Merge tag 'v5.15-rc2' into spi-5.15").
But for some inexplicable reason, Mark then merged things _again_ in
commit
59c4e190b10c ("Merge tag 'v5.15-rc3' into spi-5.15"), and screwed
things up at that point, and the __maybe_unused attribute on
tegra_slink_runtime_resume() went missing.
Reinstate it, so that alpha (and other architectures without PM support)
builds cleanly again.
Btw, this is another prime example of how random back-merges are not
good. Just don't do them. Subsystem developers should not merge my
tree in any normal circumstances. Both of those merge commits pointed
to above are bad: even the one that got the merge result right doesn't
even mention _why_ it was done, and the one that got it wrong is
obviously broken.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 25 Oct 2021 17:28:52 +0000 (10:28 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- Fix clang-related relocation warning in futex code
- Fix incorrect use of get_kernel_nofault()
- Fix bad code generation in __get_user_check() when kasan is enabled
- Ensure TLB function table is correctly aligned
- Remove duplicated string function definitions in decompressor
- Fix link-time orphan section warnings
- Fix old-style function prototype for arch_init_kprobes()
- Only warn about XIP address when not compile testing
- Handle BE32 big endian for keystone2 remapping
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9148/1: handle CONFIG_CPU_ENDIAN_BE32 in arch/arm/kernel/head.S
ARM: 9141/1: only warn about XIP address when not compile testing
ARM: 9139/1: kprobes: fix arch_init_kprobes() prototype
ARM: 9138/1: fix link warning with XIP + frame-pointer
ARM: 9134/1: remove duplicate memcpy() definition
ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B aligned
ARM: 9132/1: Fix __get_user_check failure with ARM KASAN images
ARM: 9125/1: fix incorrect use of get_kernel_nofault()
ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
Linus Torvalds [Mon, 25 Oct 2021 16:57:28 +0000 (09:57 -0700)]
Merge tag 'libata-5.15-rc7' of git://git./linux/kernel/git/dlemoal/libata
Pull libata fix from Damien Le Moal:
"A single fix in this pull request addressing an invalid error code
return in the sata_mv driver (from Zheyu)"
* tag 'libata-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: sata_mv: Fix the error handling of mv_chip_id()
Linus Torvalds [Mon, 25 Oct 2021 16:47:18 +0000 (09:47 -0700)]
Merge tag 'pinctrl-v5.15-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some late pin control fixes, the most generally annoying will probably
be the AMD IRQ storm fix affecting the Microsoft surface.
Summary:
- Three fixes pertaining to Broadcom DT bindings. Some stuff didn't
work out as inteded, we need to back out
- A resume bug fix in the STM32 driver
- Disable and mask the interrupts on probe in the AMD pinctrl driver,
affecting Microsoft surface"
* tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: disable and mask interrupts on probe
pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume()
Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode"
dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example
Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
LABBE Corentin [Thu, 21 Oct 2021 09:26:57 +0000 (10:26 +0100)]
ARM: 9148/1: handle CONFIG_CPU_ENDIAN_BE32 in arch/arm/kernel/head.S
My intel-ixp42x-welltech-epbx100 no longer boot since 4.14.
This is due to commit
463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel
mapping regression")
which forgot to handle CONFIG_CPU_ENDIAN_BE32 as possible BE config.
Suggested-by: Krzysztof Hałasa <khalasa@piap.pl>
Fixes: 463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel mapping regression")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Zheyu Ma [Fri, 22 Oct 2021 09:12:26 +0000 (09:12 +0000)]
ata: sata_mv: Fix the error handling of mv_chip_id()
mv_init_host() propagates the value returned by mv_chip_id() which in turn
gets propagated by mv_pci_init_one() and hits local_pci_probe().
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Since this is a bug rather than a recoverable runtime error we should
use dev_alert() instead of dev_err().
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Linus Torvalds [Sun, 24 Oct 2021 19:48:33 +0000 (09:48 -1000)]
Revert "mm/secretmem: use refcount_t instead of atomic_t"
This reverts commit
110860541f443f950c1274f217a1a3e298670a33.
Converting the "secretmem_users" counter to a refcount is incorrect,
because a refcount is special in zero and can't just be incremented (but
a count of users is not, and "no users" is actually perfectly valid and
not a sign of a free'd resource).
Reported-by: syzbot+75639e6a0331cd61d3e2@syzkaller.appspotmail.com
Cc: Jordy Zomer <jordy@pwning.systems>
Cc: Kees Cook <keescook@chromium.org>,
Cc: Jordy Zomer <jordy@jordyzomer.github.io>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 24 Oct 2021 19:36:06 +0000 (09:36 -1000)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull autofs fix from Al Viro:
"Fix for a braino of mine (in getting rid of open-coded
dentry_path_raw() in autofs a couple of cycles ago).
Mea culpa... Obvious -stable fodder"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
autofs: fix wait name hash calculation in autofs_wait()
Linus Torvalds [Sun, 24 Oct 2021 17:04:21 +0000 (07:04 -1000)]
Merge tag 'sched_urgent_for_v5.15_rc7' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"Reset clang's Shadow Call Stack on hotplug to prevent it from
overflowing"
* tag 'sched_urgent_for_v5.15_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/scs: Reset the shadow stack when idle_task_exit
Linus Torvalds [Sun, 24 Oct 2021 17:00:15 +0000 (07:00 -1000)]
Merge tag 'x86_urgent_for_v5.15_rc7' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
"A single change adding Dave Hansen to our maintainers team"
* tag 'x86_urgent_for_v5.15_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add Dave Hansen to the x86 maintainer team
Linus Torvalds [Sun, 24 Oct 2021 16:43:59 +0000 (06:43 -1000)]
Merge tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Ten fixes for the ksmbd kernel server, for improved security and
additional buffer overflow checks:
- a security improvement to session establishment to reduce the
possibility of dictionary attacks
- fix to ensure that maximum i/o size negotiated in the protocol is
not less than 64K and not more than 8MB to better match expected
behavior
- fix for crediting (flow control) important to properly verify that
sufficient credits are available for the requested operation
- seven additional buffer overflow, buffer validation checks"
* tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add buffer validation in session setup
ksmbd: throttle session setup failures to avoid dictionary attacks
ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests
ksmbd: validate credit charge after validating SMB2 PDU body size
ksmbd: add buffer validation for smb direct
ksmbd: limit read/write/trans buffer size not to exceed 8MB
ksmbd: validate compound response buffer
ksmbd: fix potencial 32bit overflow from data area check in smb2_write
ksmbd: improve credits management
ksmbd: add validation in smb2_ioctl
Linus Torvalds [Sun, 24 Oct 2021 16:23:48 +0000 (06:23 -1000)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Ten fixes, seven of which are in drivers.
The core fixes are one to fix a potential crash on resume, one to sort
out our reference count releases to avoid releasing in-use modules and
one to adjust the cmd per lun calculation to avoid an overflow in
hyper-v"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: ufs-pci: Force a full restore after suspend-to-disk
scsi: qla2xxx: Fix unmap of already freed sgl
scsi: qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()
scsi: qla2xxx: Return -ENOMEM if kzalloc() fails
scsi: sd: Fix crashes in sd_resume_runtime()
scsi: mpi3mr: Fix duplicate device entries when scanning through sysfs
scsi: core: Put LLD module refcnt after SCSI device is released
scsi: storvsc: Fix validation for unsolicited incoming packets
scsi: iscsi: Fix set_param() handling
scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
Linus Torvalds [Sat, 23 Oct 2021 03:42:13 +0000 (17:42 -1000)]
Merge tag 'block-5.15-2021-10-22' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Fix for the cgroup code not ussing irq safe stats updates, and one fix
for an error handling condition in add_partition()"
* tag 'block-5.15-2021-10-22' of git://git.kernel.dk/linux-block:
block: fix incorrect references to disk objects
blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpu
Linus Torvalds [Sat, 23 Oct 2021 03:34:31 +0000 (17:34 -1000)]
Merge tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two fixes for the max workers limit API that was introduced this
series: one fix for an issue with that code, and one fixing a linked
timeout regression in this series"
* tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block:
io_uring: apply worker limits to previous users
io_uring: fix ltimeout unprep
io_uring: apply max_workers limit to all future users
io-wq: max_worker fixes
Linus Torvalds [Fri, 22 Oct 2021 20:39:47 +0000 (10:39 -1000)]
Merge tag 'fuse-fixes-5.15-rc7' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Syzbot discovered a race in case of reusing the fuse sb (introduced in
this cycle).
Fix it by doing the s_fs_info initialization at the proper place"
* tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up error exits in fuse_fill_super()
fuse: always initialize sb->s_fs_info
fuse: clean up fuse_mount destruction
fuse: get rid of fuse_put_super()
fuse: check s_root when destroying sb
Linus Torvalds [Fri, 22 Oct 2021 20:31:32 +0000 (10:31 -1000)]
Merge tag 'hyperv-fixes-signed-
20211022' of git://git./linux/kernel/git/hyperv/linux
Pull hyper-v fix from Wei Liu:
- Fix vmbus ARM64 build (Arnd Bergmann)
* tag 'hyperv-fixes-signed-
20211022' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hyperv/vmbus: include linux/bitops.h
Arnd Bergmann [Mon, 18 Oct 2021 13:19:08 +0000 (15:19 +0200)]
hyperv/vmbus: include linux/bitops.h
On arm64 randconfig builds, hyperv sometimes fails with this
error:
In file included from drivers/hv/hv_trace.c:3:
In file included from drivers/hv/hyperv_vmbus.h:16:
In file included from arch/arm64/include/asm/sync_bitops.h:5:
arch/arm64/include/asm/bitops.h:11:2: error: only <linux/bitops.h> can be included directly
In file included from include/asm-generic/bitops/hweight.h:5:
include/asm-generic/bitops/arch_hweight.h:9:9: error: implicit declaration of function '__sw_hweight32' [-Werror,-Wimplicit-function-declaration]
include/asm-generic/bitops/atomic.h:17:7: error: implicit declaration of function 'BIT_WORD' [-Werror,-Wimplicit-function-declaration]
Include the correct header first.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211018131929.2260087-1-arnd@kernel.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Linus Torvalds [Fri, 22 Oct 2021 19:08:08 +0000 (09:08 -1000)]
Merge tag 'acpi-5.15-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two regressions, one related to ACPI power resources
management and one that broke ACPI tools compilation.
Specifics:
- Stop turning off unused ACPI power resources in an unknown state to
address a regression introduced during the 5.14 cycle (Rafael
Wysocki).
- Fix an ACPI tools build issue introduced recently when the minimal
stdarg.h was added (Miguel Bernal Marin)"
* tag 'acpi-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Do not turn off power resources in unknown state
ACPI: tools: fix compilation error
Linus Torvalds [Fri, 22 Oct 2021 19:02:15 +0000 (09:02 -1000)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull more x86 kvm fixes from Paolo Bonzini:
- Cache coherency fix for SEV live migration
- Fix for instruction emulation with PKU
- fixes for rare delaying of interrupt delivery
- fix for SEV-ES buffer overflow
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
KVM: SEV-ES: keep INS functions together
KVM: x86: remove unnecessary arguments from complete_emulator_pio_in
KVM: x86: split the two parts of emulator_pio_in
KVM: SEV-ES: clean up kvm_sev_es_ins/outs
KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out
KVM: SEV-ES: rename guest_ins_data to sev_pio_data
KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA
KVM: MMU: Reset mmu->pkru_mask to avoid stale data
KVM: nVMX: promptly process interrupts delivered while in guest mode
KVM: x86: check for interrupts before deciding whether to exit the fast path
Rafael J. Wysocki [Fri, 22 Oct 2021 18:45:10 +0000 (20:45 +0200)]
Merge branch 'acpi-tools'
Merge a fix for a recent ACPI tools bild regresson.
* acpi-tools:
ACPI: tools: fix compilation error
Rafael J. Wysocki [Fri, 22 Oct 2021 12:58:23 +0000 (14:58 +0200)]
PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.
Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.
Fixes: 057d51a1268f ("Merge branch 'pm-sleep'")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Paolo Bonzini [Tue, 12 Oct 2021 15:33:03 +0000 (11:33 -0400)]
KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
The PIO scratch buffer is larger than a single page, and therefore
it is not possible to copy it in a single step to vcpu->arch/pio_data.
Bound each call to emulator_pio_in/out to a single page; keep
track of how many I/O operations are left in vcpu->arch.sev_pio_count,
so that the operation can be restarted in the complete_userspace_io
callback.
For OUT, this means that the previous kvm_sev_es_outs implementation
becomes an iterator of the loop, and we can consume the sev_pio_data
buffer before leaving to userspace.
For IN, instead, consuming the buffer and decreasing sev_pio_count
is always done in the complete_userspace_io callback, because that
is when the memcpy is done into sev_pio_data.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 15:25:45 +0000 (11:25 -0400)]
KVM: SEV-ES: keep INS functions together
Make the diff a little nicer when we actually get to fixing
the bug. No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 16:35:20 +0000 (12:35 -0400)]
KVM: x86: remove unnecessary arguments from complete_emulator_pio_in
complete_emulator_pio_in can expect that vcpu->arch.pio has been filled in,
and therefore does not need the size and count arguments. This makes things
nicer when the function is called directly from a complete_userspace_io
callback.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 13 Oct 2021 16:32:02 +0000 (12:32 -0400)]
KVM: x86: split the two parts of emulator_pio_in
emulator_pio_in handles both the case where the data is pending in
vcpu->arch.pio.count, and the case where I/O has to be done via either
an in-kernel device or a userspace exit. For SEV-ES we would like
to split these, to identify clearly the moment at which the
sev_pio_data is consumed. To this end, create two different
functions: __emulator_pio_in fills in vcpu->arch.pio.count, while
complete_emulator_pio_in clears it and releases vcpu->arch.pio.data.
Because this patch has to be backported, things are left a bit messy.
kernel_pio() operates on vcpu->arch.pio, which leads to emulator_pio_in()
having with two calls to complete_emulator_pio_in(). It will be fixed
in the next release.
While at it, remove the unused void* val argument of emulator_pio_in_out.
The function currently hardcodes vcpu->arch.pio_data as the
source/destination buffer, which sucks but will be fixed after the more
severe SEV-ES buffer overflow.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 14:51:55 +0000 (10:51 -0400)]
KVM: SEV-ES: clean up kvm_sev_es_ins/outs
A few very small cleanups to the functions, smushed together because
the patch is already very small like this:
- inline emulator_pio_in_emulated and emulator_pio_out_emulated,
since we already have the vCPU
- remove the data argument and pull setting vcpu->arch.sev_pio_data into
the caller
- remove unnecessary clearing of vcpu->arch.pio.count when
emulation is done by the kernel (and therefore vcpu->arch.pio.count
is already clear on exit from emulator_pio_in and emulator_pio_out).
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 13 Oct 2021 16:29:42 +0000 (12:29 -0400)]
KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out
Currently emulator_pio_in clears vcpu->arch.pio.count twice if
emulator_pio_in_out performs kernel PIO. Move the clear into
emulator_pio_out where it is actually necessary.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 Oct 2021 14:22:34 +0000 (10:22 -0400)]
KVM: SEV-ES: rename guest_ins_data to sev_pio_data
We will be using this field for OUTS emulation as well, in case the
data that is pushed via OUTS spans more than one page. In that case,
there will be a need to save the data pointer across exits to userspace.
So, change the name to something that refers to any kind of PIO.
Also spell out what it is used for, namely SEV-ES.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 22 Oct 2021 05:06:08 +0000 (19:06 -1000)]
Merge tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Nothing too crazy at the end of the cycle, the kmb modesetting fixes
are probably a bit large but it's not a major driver, and its fixing
monitor doesn't turn on type problems.
Otherwise it's just a few minor patches, one ast regression revert, an
msm power stability fix.
ast:
- fix regression with connector detect
msm:
- fix power stability issue
msxfb:
- fix crash on unload
panel:
- sync fix
kmb:
- modesetting fixes"
* tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/ast: Add detect function support"
drm/kmb: Enable ADV bridge after modeset
drm/kmb: Corrected typo in handle_lcd_irq
drm/kmb: Disable change of plane parameters
drm/kmb: Remove clearing DPHY regs
drm/kmb: Limit supported mode to 1080p
drm/kmb: Work around for higher system clock
drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel
drm: mxsfb: Fix NULL pointer dereference crash on unload
drm/msm/devfreq: Restrict idle clamping to a618 for now
Mike Rapoport [Thu, 21 Oct 2021 07:09:29 +0000 (10:09 +0300)]
memblock: exclude MEMBLOCK_NOMAP regions from kmemleak
Vladimir Zapolskiy reports:
Commit
a7259df76702 ("memblock: make memblock_find_in_range method
private") invokes a kernel panic while running kmemleak on OF platforms
with nomaped regions:
Unable to handle kernel paging request at virtual address
fff000021e00000
[...]
scan_block+0x64/0x170
scan_gray_list+0xe8/0x17c
kmemleak_scan+0x270/0x514
kmemleak_write+0x34c/0x4ac
The memory allocated from memblock is registered with kmemleak, but if
it is marked MEMBLOCK_NOMAP it won't have linear map entries so an
attempt to scan such areas will fault.
Ideally, memblock_mark_nomap() would inform kmemleak to ignore
MEMBLOCK_NOMAP memory, but it can be called before kmemleak interfaces
operating on physical addresses can use __va() conversion.
Make sure that functions that mark allocated memory as MEMBLOCK_NOMAP
take care of informing kmemleak to ignore such memory.
Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
Link: https://lore.kernel.org/all/c30ff0a2-d196-c50d-22f0-bd50696b1205@quicinc.com
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Reported-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Thu, 21 Oct 2021 07:09:28 +0000 (10:09 +0300)]
Revert "memblock: exclude NOMAP regions from kmemleak"
Commit
6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak")
breaks boot on EFI systems with kmemleak and VM_DEBUG enabled:
efi: Processing EFI memory map:
efi: 0x000090000000-0x000091ffffff [Conventional| | | | | | | | | | |WB|WT|WC|UC]
efi: 0x000092000000-0x0000928fffff [Runtime Data|RUN| | | | | | | | | |WB|WT|WC|UC]
------------[ cut here ]------------
kernel BUG at mm/kmemleak.c:1140!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc6-next-
20211019+ #104
pstate:
600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kmemleak_free_part_phys+0x64/0x8c
lr : kmemleak_free_part_phys+0x38/0x8c
sp :
ffff800011eafbc0
x29:
ffff800011eafbc0 x28:
1fffff7fffb41c0d x27:
fffffbfffda0e068
x26:
0000000092000000 x25:
1ffff000023d5f94 x24:
ffff800011ed84d0
x23:
ffff800011ed84c0 x22:
ffff800011ed83d8 x21:
0000000000900000
x20:
ffff800011782000 x19:
0000000092000000 x18:
ffff800011ee0730
x17:
0000000000000000 x16:
0000000000000000 x15:
1ffff0000233252c
x14:
ffff800019a905a0 x13:
0000000000000001 x12:
ffff7000023d5ed7
x11:
1ffff000023d5ed6 x10:
ffff7000023d5ed6 x9 :
dfff800000000000
x8 :
ffff800011eaf6b7 x7 :
0000000000000001 x6 :
ffff800011eaf6b0
x5 :
00008ffffdc2a12a x4 :
ffff7000023d5ed7 x3 :
1ffff000023dbf99
x2 :
1ffff000022f0463 x1 :
0000000000000000 x0 :
ffffffffffffffff
Call trace:
kmemleak_free_part_phys+0x64/0x8c
memblock_mark_nomap+0x5c/0x78
reserve_regions+0x294/0x33c
efi_init+0x2d0/0x490
setup_arch+0x80/0x138
start_kernel+0xa0/0x3ec
__primary_switched+0xc0/0xc8
Code:
34000041 97d526e7 f9418e80 36000040 (
d4210000)
random: get_random_bytes called from print_oops_end_marker+0x34/0x80 with crng_init=0
---[ end trace
0000000000000000 ]---
The crash happens because kmemleak_free_part_phys() tries to use __va()
before memstart_addr is initialized and this triggers a VM_BUG_ON() in
arch/arm64/include/asm/memory.h:
Revert
6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak"),
the issue it is fixing will be fixed differently.
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 22 Oct 2021 03:27:17 +0000 (17:27 -1000)]
Merge branch 'ucount-fixes-for-v5.15' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull ucounts fixes from Eric Biederman:
"There has been one very hard to track down bug in the ucount code that
we have been tracking since roughly v5.14 was released. Alex managed
to find a reliable reproducer a few days ago and then I was able to
instrument the code and figure out what the issue was.
It turns out the sigqueue_alloc single atomic operation optimization
did not play nicely with ucounts multiple level rlimits. It turned out
that either sigqueue_alloc or sigqueue_free could be operating on
multiple levels and trigger the conditions for the optimization on
more than one level at the same time.
To deal with that situation I have introduced inc_rlimit_get_ucounts
and dec_rlimit_put_ucounts that just focuses on the optimization and
the rlimit and ucount changes.
While looking into the big bug I found I couple of other little issues
so I am including those fixes here as well.
When I have time I would very much like to dig into process ownership
of the shared signal queue and see if we could pick a single owner for
the entire queue so that all of the rlimits can count to that owner.
That should entirely remove the need to call get_ucounts and
put_ucounts in sigqueue_alloc and sigqueue_free. It is difficult
because Linux unlike POSIX supports setuid that works on a single
thread"
* 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring
ucounts: Proper error handling in set_cred_ucounts
ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds
ucounts: Fix signal ucount refcounting
Linus Torvalds [Fri, 22 Oct 2021 01:36:50 +0000 (15:36 -1000)]
Merge tag 'net-5.15-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, and can.
We'll have one more fix for a socket accounting regression, it's still
getting polished. Otherwise things look fine.
Current release - regressions:
- revert "vrf: reset skb conntrack connection on VRF rcv", there are
valid uses for previous behavior
- can: m_can: fix iomap_read_fifo() and iomap_write_fifo()
Current release - new code bugs:
- mlx5: e-switch, return correct error code on group creation failure
Previous releases - regressions:
- sctp: fix transport encap_port update in sctp_vtag_verify
- stmmac: fix E2E delay mechanism (in PTP timestamping)
Previous releases - always broken:
- netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr
- netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of
init
- netfilter: ipvs: make global sysctl read-only in non-init netns
- tcp: md5: fix selection between vrf and non-vrf keys
- ipv6: count rx stats on the orig netdev when forwarding
- bridge: mcast: use multicast_membership_interval for IGMPv3
- can:
- j1939: fix UAF for rx_kref of j1939_priv abort sessions on
receiving bad messages
- isotp: fix TX buffer concurrent access in isotp_sendmsg() fix
return error on FC timeout on TX path
- ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited
- hns3: schedule the polling again when allocation fails, prevent
stalls
- drivers: add missing of_node_put() when aborting
for_each_available_child_of_node()
- ptp: fix possible memory leak and UAF in ptp_clock_register()
- e1000e: fix packet loss in burst mode on Tiger Lake and later
- mlx5e: ipsec: fix more checksum offload issues"
* tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
usbnet: sanity check for maxpacket
net: enetc: make sure all traffic classes can send large frames
net: enetc: fix ethtool counter name for PM0_TERR
ptp: free 'vclock_index' in ptp_clock_release()
sfc: Don't use netif_info before net_device setup
sfc: Export fibre-specific supported link modes
net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags
net/mlx5e: IPsec: Fix a misuse of the software parser's fields
net/mlx5e: Fix vlan data lost during suspend flow
net/mlx5: E-switch, Return correct error code on group creation failure
net/mlx5: Lag, change multipath and bonding to be mutually exclusive
ice: Add missing E810 device ids
igc: Update I226_K device ID
e1000e: Fix packet loss on Tiger Lake and later
e1000e: Separate TGP board type from SPT
ptp: Fix possible memory leak in ptp_clock_register()
net: stmmac: Fix E2E delay mechanism
nfc: st95hf: Make spi remove() callback return zero
net: hns3: disable sriov before unload hclge layer
net: hns3: fix vf reset workqueue cannot exit
...
Linus Torvalds [Fri, 22 Oct 2021 01:30:09 +0000 (15:30 -1000)]
Merge tag 'powerpc-5.15-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a bug exposed by a previous fix, where running guests with
certain SMT topologies could crash the host on Power8.
- Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is
enabled.
Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider.
* tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/smp: do not decrement idle task preempt count in CPU offline
powerpc/idle: Don't corrupt back chain when going idle
Kim Phillips [Thu, 21 Oct 2021 15:30:06 +0000 (10:30 -0500)]
Revert "drm/ast: Add detect function support"
This reverts commit
aae74ff9caa8de9a45ae2e46068c417817392a26,
since it prevents my AMD Milan system from booting, with:
[ 27.189558] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 27.197506] #PF: supervisor write access in kernel mode
[ 27.203333] #PF: error_code(0x0002) - not-present page
[ 27.209064] PGD 0 P4D 0
[ 27.211885] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 27.216744] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6+ #15
[ 27.223928] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021
[ 27.232955] RIP: 0010:run_timer_softirq+0x38b/0x4a0
[ 27.238397] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c
[ 27.259350] RSP: 0018:
ffffc42d00003ee8 EFLAGS:
00010086
[ 27.265176] RAX:
dead000000000122 RBX:
0000000000000000 RCX:
0000000000000101
[ 27.273134] RDX:
0000000000000000 RSI:
0000000000000087 RDI:
0000000000000001
[ 27.281084] RBP:
ffffc42d00003f70 R08:
0000000000000000 R09:
00000000000003eb
[ 27.289043] R10:
ffffa0860cb300d0 R11:
ffffa0c44de290b0 R12:
ffffc42d00003ef8
[ 27.297002] R13:
00000000fffef200 R14:
ffffa0c44de18dc0 R15:
ffffa0867a882350
[ 27.304961] FS:
0000000000000000(0000) GS:
ffffa0c44de00000(0000) knlGS:
0000000000000000
[ 27.313988] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 27.320396] CR2:
0000000000000000 CR3:
000000014569c001 CR4:
0000000000770ef0
[ 27.328346] PKRU:
55555554
[ 27.331359] Call Trace:
[ 27.334073] <IRQ>
[ 27.336314] ? __queue_work+0x420/0x420
[ 27.340589] ? lapic_next_event+0x21/0x30
[ 27.345060] ? clockevents_program_event+0x8f/0xe0
[ 27.350402] __do_softirq+0xfb/0x2db
[ 27.354388] irq_exit_rcu+0x98/0xd0
[ 27.358275] sysvec_apic_timer_interrupt+0xac/0xd0
[ 27.363620] </IRQ>
[ 27.365955] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 27.371685] RIP: 0010:cpuidle_enter_state+0xcc/0x390
[ 27.377292] Code: 3d 01 79 0a 50 e8 44 ed 77 ff 49 89 c6 0f 1f 44 00 00 31 ff e8 f5 f8 77 ff 80 7d d7 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 17 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 48 8d
[ 27.398243] RSP: 0018:
ffffffffb0e03dc8 EFLAGS:
00000246
[ 27.404069] RAX:
ffffa0c44de00000 RBX:
0000000000000001 RCX:
000000000000001f
[ 27.412028] RDX:
0000000000000000 RSI:
ffffffffb0bafc1f RDI:
ffffffffb0bbdb81
[ 27.419986] RBP:
ffffffffb0e03e00 R08:
00000006549f8f3f R09:
ffffffffb1065200
[ 27.427935] R10:
ffffa0c44de27ae4 R11:
ffffa0c44de27ac4 R12:
ffffa0c5634cb000
[ 27.435894] R13:
ffffffffb1065200 R14:
00000006549f8f3f R15:
0000000000000001
[ 27.443854] ? cpuidle_enter_state+0xbb/0x390
[ 27.448712] cpuidle_enter+0x2e/0x40
[ 27.452695] call_cpuidle+0x23/0x40
[ 27.456584] do_idle+0x1f0/0x270
[ 27.460181] cpu_startup_entry+0x20/0x30
[ 27.464553] rest_init+0xd4/0xe0
[ 27.468149] arch_call_rest_init+0xe/0x1b
[ 27.472619] start_kernel+0x6bc/0x6e2
[ 27.476764] x86_64_start_reservations+0x24/0x26
[ 27.481912] x86_64_start_kernel+0x75/0x79
[ 27.486477] secondary_startup_64_no_verify+0xb0/0xbb
[ 27.492111] Modules linked in: kvm_amd(+) kvm ipmi_si(+) ipmi_devintf rapl wmi_bmof ipmi_msghandler input_leds ccp k10temp mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel syscopyarea aesni_intel sysfillrect crypto_simd sysimgblt fb_sys_fops cryptd hid_generic cec nvme ahci usbhid drm e1000e nvme_core hid libahci i2c_piix4 wmi
[ 27.551789] CR2:
0000000000000000
[ 27.555482] ---[ end trace
897987dfe93dccc6 ]---
[ 27.560630] RIP: 0010:run_timer_softirq+0x38b/0x4a0
[ 27.566069] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c
[ 27.587021] RSP: 0018:
ffffc42d00003ee8 EFLAGS:
00010086
[ 27.592848] RAX:
dead000000000122 RBX:
0000000000000000 RCX:
0000000000000101
[ 27.600808] RDX:
0000000000000000 RSI:
0000000000000087 RDI:
0000000000000001
[ 27.608765] RBP:
ffffc42d00003f70 R08:
0000000000000000 R09:
00000000000003eb
[ 27.616716] R10:
ffffa0860cb300d0 R11:
ffffa0c44de290b0 R12:
ffffc42d00003ef8
[ 27.624673] R13:
00000000fffef200 R14:
ffffa0c44de18dc0 R15:
ffffa0867a882350
[ 27.632624] FS:
0000000000000000(0000) GS:
ffffa0c44de00000(0000) knlGS:
0000000000000000
[ 27.641650] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 27.648159] CR2:
0000000000000000 CR3:
000000014569c001 CR4:
0000000000770ef0
[ 27.656119] PKRU:
55555554
[ 27.659133] Kernel panic - not syncing: Fatal exception in interrupt
[ 29.030411] Shutting down cpus with NMI
[ 29.034699] Kernel Offset: 0x2e600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 29.046790] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Since unreliable, found by bisecting for KASAN's use-after-free in
enqueue_timer+0x4f/0x1e0, where the timer callback is called.
Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Fixes: aae74ff9caa8 ("drm/ast: Add detect function support")
Link: https://lore.kernel.org/lkml/0f7871be-9ca6-5ae4-3a40-5db9a8fb2365@amd.com/
Cc: Ainux <ainux.wang@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: sterlingteng@gmail.com
Cc: chenhuacai@kernel.org
Cc: Chuck Lever III <chuck.lever@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jon Grimm <jon.grimm@amd.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021153006.92983-1-kim.phillips@amd.com
Dave Airlie [Thu, 21 Oct 2021 19:34:54 +0000 (05:34 +1000)]
Merge tag 'drm-misc-fixes-2021-10-21-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.15-rc7:
- Rebased, to remove vc4 patches.
- Fix mxsfb crash on unload.
- Use correct sync parameters for Feixin K101-IM2BYL02.
- Assorted kmb modeset/atomic fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e66eaf89-b9b9-41f5-d0d2-dad7e59fabb5@linux.intel.com
Dave Airlie [Thu, 21 Oct 2021 19:22:10 +0000 (05:22 +1000)]
Merge tag 'drm-msm-fixes-2021-10-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
One more fix for v5.15, to work around a power stability issue on a630
(and possibly others)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs1WPLthmd=ToDcEHm=u-7O38RAVJ2XwRoS8xPmC520vg@mail.gmail.com
Rafael J. Wysocki [Thu, 21 Oct 2021 17:44:27 +0000 (19:44 +0200)]
Merge tag 'dtpm-v5.16' of https://git.linaro.org/people/daniel.lezcano/linux
Pull Dynamic Thermal Power Management (DTPM) framework changes for v5.16
from Daniel Lezcano:
- Simplify and make the code more self-encapsulate by dealing with the
dtpm structure only (Daniel Lezcano)
- Fix power intialization (Daniel Lezcano)
- Add the CPU load consideration when estimating the instaneous power
consumption (Daniel Lezcano)
* tag 'dtpm-v5.16' of https://git.linaro.org/people/daniel.lezcano/linux:
powercap/drivers/dtpm: Fix power limit initialization
powercap/drivers/dtpm: Scale the power with the load
powercap/drivers/dtpm: Use container_of instead of a private data field
powercap/drivers/dtpm: Simplify the dtpm table
powercap/drivers/dtpm: Encapsulate even more the code
Ye Bin [Wed, 13 Oct 2021 12:19:14 +0000 (20:19 +0800)]
PM: hibernate: Get block device exclusively in swsusp_check()
The following kernel crash can be triggered:
[ 89.266592] ------------[ cut here ]------------
[ 89.267427] kernel BUG at fs/buffer.c:3020!
[ 89.268264] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 89.269116] CPU: 7 PID: 1750 Comm: kmmpd-loop0 Not tainted 5.10.0-862.14.0.6.x86_64
-08610-gc932cda3cef4-dirty #20
[ 89.273169] RIP: 0010:submit_bh_wbc.isra.0+0x538/0x6d0
[ 89.277157] RSP: 0018:
ffff888105ddfd08 EFLAGS:
00010246
[ 89.278093] RAX:
0000000000000005 RBX:
ffff888124231498 RCX:
ffffffffb2772612
[ 89.279332] RDX:
1ffff11024846293 RSI:
0000000000000008 RDI:
ffff888124231498
[ 89.280591] RBP:
ffff8881248cc000 R08:
0000000000000001 R09:
ffffed1024846294
[ 89.281851] R10:
ffff88812423149f R11:
ffffed1024846293 R12:
0000000000003800
[ 89.283095] R13:
0000000000000001 R14:
0000000000000000 R15:
ffff8881161f7000
[ 89.284342] FS:
0000000000000000(0000) GS:
ffff88839b5c0000(0000) knlGS:
0000000000000000
[ 89.285711] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 89.286701] CR2:
00007f166ebc01a0 CR3:
0000000435c0e000 CR4:
00000000000006e0
[ 89.287919] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 89.289138] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 89.290368] Call Trace:
[ 89.290842] write_mmp_block+0x2ca/0x510
[ 89.292218] kmmpd+0x433/0x9a0
[ 89.294902] kthread+0x2dd/0x3e0
[ 89.296268] ret_from_fork+0x22/0x30
[ 89.296906] Modules linked in:
by running the following commands:
1. mkfs.ext4 -O mmp /dev/sda -b 1024
2. mount /dev/sda /home/test
3. echo "/dev/sda" > /sys/power/resume
That happens because swsusp_check() calls set_blocksize() on the
target partition which confuses the file system:
Thread1 Thread2
mount /dev/sda /home/test
get s_mmp_bh --> has mapped flag
start kmmpd thread
echo "/dev/sda" > /sys/power/resume
resume_store
software_resume
swsusp_check
set_blocksize
truncate_inode_pages_range
truncate_cleanup_page
block_invalidatepage
discard_buffer --> clean mapped flag
write_mmp_block
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh))
To address this issue, modify swsusp_check() to open the target block
device with exclusive access.
Signed-off-by: Ye Bin <yebin10@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pavel Begunkov [Thu, 21 Oct 2021 12:20:29 +0000 (13:20 +0100)]
io_uring: apply worker limits to previous users
Another change to the API io-wq worker limitation API added in 5.15,
apply the limit to all prior users that already registered a tctx. It
may be confusing as it's now, in particular the change covers the
following 2 cases:
TASK1 | TASK2
_________________________________________________
ring = create() |
| limit_iowq_workers()
*not limited* |
TASK1 | TASK2
_________________________________________________
ring = create() |
| issue_requests()
limit_iowq_workers() |
| *not limited*
A note on locking, it's safe to traverse ->tctx_list as we hold
->uring_lock, but do that after dropping sqd->lock to avoid possible
problems. It's also safe to access tctx->io_wq there because tasks
kill it only after removing themselves from tctx_list, see
io_uring_cancel_generic() -> io_uring_clean_tctx()
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6e09ecc3545e4dc56e43c906ee3d71b7ae21bed.1634818641.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Masahiro Kozuka [Tue, 14 Sep 2021 21:09:51 +0000 (14:09 -0700)]
KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA
Flush the destination page before invoking RECEIVE_UPDATE_DATA, as the
PSP encrypts the data with the guest's key when writing to guest memory.
If the target memory was not previously encrypted, the cache may contain
dirty, unecrypted data that will persist on non-coherent systems.
Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Masahiro Kozuka <masa.koz@kozuka.jp>
[sean: converted bug report to changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20210914210951.
2994260-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Chenyi Qiang [Thu, 21 Oct 2021 07:10:22 +0000 (15:10 +0800)]
KVM: MMU: Reset mmu->pkru_mask to avoid stale data
When updating mmu->pkru_mask, the value can only be added but it isn't
reset in advance. This will make mmu->pkru_mask keep the stale data.
Fix this issue.
Fixes: 2d344105f57c ("KVM, pkeys: introduce pkru_mask to cache conditions")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <
20211021071022.1140-1-chenyi.qiang@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Daniel Lezcano [Thu, 18 Mar 2021 20:52:38 +0000 (21:52 +0100)]
powercap/drivers/dtpm: Fix power limit initialization
When a DTPM node is registered its power limit must be initialized to
the power max.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210318205238.21937-1-daniel.lezcano@linaro.org
Daniel Lezcano [Fri, 12 Mar 2021 13:04:11 +0000 (14:04 +0100)]
powercap/drivers/dtpm: Scale the power with the load
Currently the power consumption is based on the current OPP power
assuming the entire performance domain is fully loaded.
That gives very gross power estimation and we can do much better by
using the load to scale the power consumption.
Use the utilization to normalize and scale the power usage over the
max possible power.
Tested on a rock960 with 2 big CPUS, the power consumption estimation
conforms with the expected one.
Before this change:
~$ ~/dhrystone -t 1 -l 10000&
~$ cat /sys/devices/virtual/powercap/dtpm/dtpm:0/dtpm:0:1/constraint_0_max_power_uw
2260000
After this change:
~$ ~/dhrystone -t 1 -l 10000&
~$ cat /sys/devices/virtual/powercap/dtpm/dtpm:0/dtpm:0:1/constraint_0_max_power_uw
1130000
~$ ~/dhrystone -t 2 -l 10000&
~$ cat /sys/devices/virtual/powercap/dtpm/dtpm:0/dtpm:0:1/constraint_0_max_power_uw
2260000
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-5-daniel.lezcano@linaro.org
Daniel Lezcano [Fri, 12 Mar 2021 13:04:10 +0000 (14:04 +0100)]
powercap/drivers/dtpm: Use container_of instead of a private data field
The dtpm framework provides an API to allocate a dtpm node. However
when a backend dtpm driver needs to allocate a dtpm node it must
define its own structure and store the pointer of this structure in
the private field of the dtpm structure.
It is more elegant to use the container_of macro and add the dtpm
structure inside the dtpm backend specific structure. The code will be
able to deal properly with the dtpm structure as a generic entity,
making all this even more self-encapsulated.
The dtpm_alloc() function does no longer make sense as the dtpm
structure will be allocated when allocating the device specific dtpm
structure. The dtpm_init() is provided instead.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-4-daniel.lezcano@linaro.org
Daniel Lezcano [Fri, 12 Mar 2021 13:04:09 +0000 (14:04 +0100)]
powercap/drivers/dtpm: Simplify the dtpm table
The dtpm table is an array of pointers, that forces the user of the
table to define initdata along with the declaration of the table
entry. It is more efficient to create an array of dtpm structure, so
the declaration of the table entry can be done by initializing the
different fields.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-3-daniel.lezcano@linaro.org
Daniel Lezcano [Fri, 12 Mar 2021 13:04:07 +0000 (14:04 +0100)]
powercap/drivers/dtpm: Encapsulate even more the code
In order to increase the self-encapsulation of the dtpm generic code,
the following changes are adding a power update ops to the dtpm
ops. That allows the generic code to call directly the dtpm backend
function to update the power values.
The power update function does compute the power characteristics when
the function is invoked. In the case of the CPUs, the power
consumption depends on the number of online CPUs. The online CPUs mask
is not up to date at CPUHP_AP_ONLINE_DYN state in the tear down
callback. That is the reason why the online / offline are at separate
state. As there is already an existing state for DTPM, this one is
only moved to the DEAD state, so there is no addition of new state
with these changes. The dtpm node is not removed when the cpu is
unplugged.
That simplifies the code for the next changes and results in a more
self-encapsulated code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-1-daniel.lezcano@linaro.org
Oliver Neukum [Thu, 21 Oct 2021 12:29:44 +0000 (14:29 +0200)]
usbnet: sanity check for maxpacket
maxpacket of 0 makes no sense and oopses as we need to divide
by it. Give up.
V2: fixed typo in log and stylistic issues
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: syzbot+76bb1d34ffa0adc03baa@syzkaller.appspotmail.com
Reviewed-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211021122944.21816-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 20 Oct 2021 17:33:40 +0000 (20:33 +0300)]
net: enetc: make sure all traffic classes can send large frames
The enetc driver does not implement .ndo_change_mtu, instead it
configures the MAC register field PTC{Traffic Class}MSDUR[MAXSDU]
statically to a large value during probe time.
The driver used to configure only the max SDU for traffic class 0, and
that was fine while the driver could only use traffic class 0. But with
the introduction of mqprio, sending a large frame into any other TC than
0 is broken.
This patch fixes that by replicating per traffic class the static
configuration done in enetc_configure_port_mac().
Fixes: cbe9e835946f ("enetc: Enable TC offloading with mqprio")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020173340.1089992-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 20 Oct 2021 16:52:06 +0000 (19:52 +0300)]
net: enetc: fix ethtool counter name for PM0_TERR
There are two counters named "MAC tx frames", one of them is actually
incorrect. The correct name for that counter should be "MAC tx error
frames", which is symmetric to the existing "MAC rx error frames".
Fixes: 16eb4c85c964 ("enetc: Add ethtool statistics")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020165206.1069889-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Gleixner [Wed, 20 Oct 2021 21:08:16 +0000 (23:08 +0200)]
MAINTAINERS: Add Dave Hansen to the x86 maintainer team
Dave is already listed as x86/mm maintainer, has a profund knowledge
of the x86 architecture in general and a good taste in terms of kernel
programming in general.
Add him as a full x86 maintainer with all rights and duties.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/87zgr3flq7.ffs@tglx
Yang Yingliang [Thu, 21 Oct 2021 09:13:53 +0000 (17:13 +0800)]
ptp: free 'vclock_index' in ptp_clock_release()
'vclock_index' is accessed from sysfs, it shouled be freed
in release function, so move it from ptp_clock_unregister()
to ptp_clock_release().
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Ekman [Tue, 19 Oct 2021 22:40:16 +0000 (00:40 +0200)]
sfc: Don't use netif_info before net_device setup
Use pci_info instead to avoid unnamed/uninitialized noise:
[197088.688729] sfc 0000:01:00.0: Solarflare NIC detected
[197088.690333] sfc 0000:01:00.0: Part Number : SFN5122F
[197088.729061] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no SR-IOV VFs probed
[197088.729071] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no PTP support
Inspired by
fa44821a4ddd ("sfc: don't use netif_info et al before
net_device is registered") from Heiner Kallweit.
Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Ekman [Tue, 19 Oct 2021 21:13:32 +0000 (23:13 +0200)]
sfc: Export fibre-specific supported link modes
The 1/10GbaseT modes were set up for cards with SFP+ cages in
3497ed8c852a5 ("sfc: report supported link speeds on SFP connections").
10GbaseT was likely used since no 10G fibre mode existed.
The missing fibre modes for 1/10G were added to ethtool.h in
5711a9822144
("net: ethtool: add support for 1000BaseX and missing 10G link modes")
shortly thereafter.
The user guide available at https://support-nic.xilinx.com/wp/drivers
lists support for the following cable and transceiver types in section 2.9:
- QSFP28 100G Direct Attach Cables
- QSFP28 100G SR Optical Transceivers (with SR4 modules listed)
- SFP28 25G Direct Attach Cables
- SFP28 25G SR Optical Transceivers
- QSFP+ 40G Direct Attach Cables
- QSFP+ 40G Active Optical Cables
- QSFP+ 40G SR4 Optical Transceivers
- QSFP+ to SFP+ Breakout Direct Attach Cables
- QSFP+ to SFP+ Breakout Active Optical Cables
- SFP+ 10G Direct Attach Cables
- SFP+ 10G SR Optical Transceivers
- SFP+ 10G LR Optical Transceivers
- SFP 1000BASE‐T Transceivers
- 1G Optical Transceivers
(From user guide issue 28. Issue 16 which also includes older cards like
SFN5xxx/SFN6xxx has matching lists for 1/10/40G transceiver types.)
Regarding SFP+ 10GBASE‐T transceivers the latest guide says:
"Solarflare adapters do not support 10GBASE‐T transceiver modules."
Tested using SFN5122F-R7 (with 2 SFP+ ports). Supported link modes do not change
depending on module used (tested with 1000BASE-T, 1000BASE-BX10, 10GBASE-LR).
Before:
$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
Current message level: 0x000020f7 (8439)
drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes
After:
$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
1000baseX/Full
10000baseCR/Full
10000baseSR/Full
10000baseLR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
Supports Wake-on: g
Wake-on: d
Current message level: 0x000020f7 (8439)
drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes
Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:32:41 +0000 (12:32 +0100)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter fixes for net:
1) Crash due to missing initialization of timer data in
xt_IDLETIMER, from Juhee Kang.
2) NF_CONNTRACK_SECMARK should be bool in Kconfig, from Vegard Nossum.
3) Skip netdev events on netns removal, from Florian Westphal.
4) Add testcase to show port shadowing via UDP, also from Florian.
5) Remove pr_debug() code in ip6t_rt, this fixes a crash due to
unsafe access to non-linear skbuff, from Xin Long.
6) Make net/ipv4/vs/debug_level read-only from non-init netns,
from Antoine Tenart.
7) Remove bogus invocation to bash in selftests/netfilter/nft_flowtable.sh
also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:11:26 +0000 (12:11 +0100)]
Merge tag 'mlx5-fixes-2021-10-20' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-fixes-2021-10-20
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:10:29 +0000 (12:10 +0100)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-10-20
This series contains updates to e1000e, igc, and ice drivers.
Sasha fixes an issue with dropped packets on Tiger Lake platforms for
e1000e and corrects a device ID for igc.
Tony adds missing E810 device IDs for ice.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Cai Huoqing [Mon, 18 Oct 2021 13:16:21 +0000 (21:16 +0800)]
PM: hibernate: swap: Use vzalloc() and kzalloc()
Replace vmalloc()/memset() with vzalloc() and kmalloc()/memset() with
kzalloc() to simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Anders Roxell [Thu, 7 Oct 2021 19:13:37 +0000 (21:13 +0200)]
PM: hibernate: fix sparse warnings
When building the kernel with sparse enabled 'C=1' the following
warnings shows up:
kernel/power/swap.c:390:29: warning: incorrect type in assignment (different base types)
kernel/power/swap.c:390:29: expected int ret
kernel/power/swap.c:390:29: got restricted blk_status_t
This is due to function hib_wait_io() returns a 'blk_status_t' which is
a bitwise u8. Commit
5416da01ff6e ("PM: hibernate: Remove
blk_status_to_errno in hib_wait_io") seemed to have mixed up the return
type. However, the
4e4cbee93d56 ("block: switch bios to blk_status_t")
actually broke the behaviour by returning the wrong type.
Rework so function hib_wait_io() returns a 'int' instead of
'blk_status_t' and make sure to call function
blk_status_to_errno(hb->error)' when returning from function
hib_wait_io() a int gets returned.
Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t")
Fixes: 5416da01ff6e ("PM: hibernate: Remove blk_status_to_errno in hib_wait_io")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 20 Oct 2021 19:08:06 +0000 (21:08 +0200)]
cpufreq: Fix typo in cpufreq.h
s/internale/internal/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Rafael J. Wysocki [Fri, 15 Oct 2021 16:45:59 +0000 (18:45 +0200)]
PCI: PM: Fix ordering of operations in pci_back_from_sleep()
The ordering of operations in pci_back_from_sleep() is incorrect,
because the device may be in D3cold when it runs and pci_enable_wake()
needs to access the device's configuration space which cannot be
done in D3cold.
Fix this by calling pci_set_power_state() to put the device into D0
before calling pci_enable_wake() for it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Anitha Chrisanthus [Mon, 7 Jun 2021 21:17:11 +0000 (14:17 -0700)]
drm/kmb: Enable ADV bridge after modeset
On KMB, ADV bridge must be programmed and powered on prior to
MIPI DSI HW initialization.
v2: changed to atomic_bridge_chain_enable (Sam)
Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Co-developed-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211019230719.789958-1-anitha.chrisanthus@intel.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Anitha Chrisanthus [Mon, 19 Jul 2021 23:28:51 +0000 (16:28 -0700)]
drm/kmb: Corrected typo in handle_lcd_irq
Check for Overflow bits for layer3 in the irq handler.
Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-5-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Edmund Dea [Wed, 6 Oct 2021 23:03:48 +0000 (16:03 -0700)]
drm/kmb: Disable change of plane parameters
Due to HW limitations, KMB cannot change height, width, or
pixel format after initial plane configuration.
v2: removed memset disp_cfg as it is already zero.
Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-4-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Edmund Dea [Tue, 20 Apr 2021 22:31:53 +0000 (15:31 -0700)]
drm/kmb: Remove clearing DPHY regs
Don't clear the shared DPHY registers common to MIPI Rx and MIPI Tx during
DSI initialization since this was causing MIPI Rx reset. Rest of the
writes are bitwise, so will not affect Mipi Rx side.
Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-3-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Anitha Chrisanthus [Fri, 8 Jan 2021 22:34:13 +0000 (14:34 -0800)]
drm/kmb: Limit supported mode to 1080p
KMB only supports single resolution(1080p), this commit checks for
1920x1080x60 or 1920x1080x59 in crtc_mode_valid.
Also, modes with vfp < 4 are not supported in KMB display. This change
prunes display modes with vfp < 4.
v2: added vfp check
Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Co-developed-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link:https://patchwork.freedesktop.org/patch/msgid/
20211013233632.471892-2-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Anitha Chrisanthus [Tue, 15 Dec 2020 19:13:09 +0000 (11:13 -0800)]
drm/kmb: Work around for higher system clock
Use a different value for system clock offset in the
ppl/llp ratio calculations for clocks higher than 500 Mhz.
Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-1-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Dan Johansen [Wed, 18 Aug 2021 21:48:18 +0000 (23:48 +0200)]
drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel
This adjusts sync values according to the datasheet
Fixes: 1c243751c095 ("drm/panel: ilitek-ili9881c: add support for Feixin K101-IM2BYL02 panel")
Co-developed-by: Marius Gripsgard <marius@ubports.com>
Signed-off-by: Dan Johansen <strit@manjaro.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210818214818.298089-1-strit@manjaro.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Marek Vasut [Sat, 16 Oct 2021 21:04:46 +0000 (23:04 +0200)]
drm: mxsfb: Fix NULL pointer dereference crash on unload
The mxsfb->crtc.funcs may already be NULL when unloading the driver,
in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from
mxsfb_unload() leads to NULL pointer dereference.
Since all we care about is masking the IRQ and mxsfb->base is still
valid, just use that to clear and mask the IRQ.
Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline helper")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Daniel Abrecht <public@danielabrecht.ch>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stefan Agner <stefan@agner.ch>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211016210446.171616-1-marex@denx.de
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:39 +0000 (10:01 +0200)]
fuse: clean up error exits in fuse_fill_super()
Instead of "goto err", return error directly, since there's no error
cleanup to do now.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:39 +0000 (10:01 +0200)]
fuse: always initialize sb->s_fs_info
Syzkaller reports a null pointer dereference in fuse_test_super() that is
caused by sb->s_fs_info being NULL.
This is due to the fact that fuse_fill_super() is initializing s_fs_info,
which is too late, it's already on the fs_supers list. The initialization
needs to be done in sget_fc() with the sb_lock held.
Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into
fuse_get_tree().
After this ->kill_sb() will always be called with non-NULL ->s_fs_info,
hence fuse_mount_destroy() can drop the test for non-NULL "fm".
Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com
Fixes: 5d5b74aa9c76 ("fuse: allow sharing existing sb")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:39 +0000 (10:01 +0200)]
fuse: clean up fuse_mount destruction
1. call fuse_mount_destroy() for open coded variants
2. before deactivate_locked_super() don't need fuse_mount destruction since
that will now be done (if ->s_fs_info is not cleared)
3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the
regular pattern can be used
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:38 +0000 (10:01 +0200)]
fuse: get rid of fuse_put_super()
The ->put_super callback is called from generic_shutdown_super() in case of
a fully initialized sb. This is called from kill_***_super(), which is
called from ->kill_sb instances.
Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the
reference to the fuse_conn, while it does the same on each error case
during sb setup.
This patch moves the destruction from fuse_put_super() to
fuse_mount_destroy(), called at the end of all ->kill_sb instances. A
follup patch will clean up the error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 21 Oct 2021 08:01:38 +0000 (10:01 +0200)]
fuse: check s_root when destroying sb
Checking "fm" works because currently sb->s_fs_info is cleared on error
paths; however, sb->s_root is what generic_shutdown_super() checks to
determine whether the sb was fully initialized or not.
This change will allow cleanup of sb setup error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Paolo Bonzini [Wed, 20 Oct 2021 10:22:59 +0000 (06:22 -0400)]
KVM: nVMX: promptly process interrupts delivered while in guest mode
Since commit
c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with
more precise fix") there is no longer the certainty that check_nested_events()
tries to inject an external interrupt vmexit to L1 on every call to vcpu_enter_guest.
Therefore, even in that case we need to set KVM_REQ_EVENT. This ensures
that inject_pending_event() is called, and from there kvm_check_nested_events().
Fixes: c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with more precise fix")
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 20 Oct 2021 10:27:36 +0000 (06:27 -0400)]
KVM: x86: check for interrupts before deciding whether to exit the fast path
The kvm_x86_sync_pir_to_irr callback can sometimes set KVM_REQ_EVENT.
If that happens exactly at the time that an exit is handled as
EXIT_FASTPATH_REENTER_GUEST, vcpu_enter_guest will go incorrectly
through the loop that calls kvm_x86_run, instead of processing
the request promptly.
Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ian Kent [Thu, 23 Sep 2021 07:13:39 +0000 (15:13 +0800)]
autofs: fix wait name hash calculation in autofs_wait()
There's a mistake in commit
2be7828c9fefc ("get rid of autofs_getpath()")
that affects kernels from v5.13.0, basically missed because of me not
fully testing the change for Al.
The problem is that the hash calculation for the wait name qstr hasn't
been updated to account for the change to use dentry_path_raw(). This
prevents the correct matching an existing wait resulting in multiple
notifications being sent to the daemon for the same mount which must
not occur.
The problem wasn't discovered earlier because it only occurs when
multiple processes trigger a request for the same mount concurrently
so it only shows up in more aggressive testing.
Fixes: 2be7828c9fefc ("get rid of autofs_getpath()")
Cc: stable@vger.kernel.org
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Wed, 20 Oct 2021 20:23:05 +0000 (10:23 -1000)]
Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two important filesystem fixes, marked for stable.
The blocklisted superblocks issue was particularly annoying because
for unexperienced users it essentially exacted a reboot to establish a
new functional mount in that scenario"
* tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client:
ceph: fix handling of "meta" errors
ceph: skip existing superblocks that are blocklisted or shut down when mounting
Linus Torvalds [Wed, 20 Oct 2021 20:16:51 +0000 (10:16 -1000)]
Merge tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- fix more dma-debug fallout (Gerald Schaefer, Hamza Mahfooz)
- fix a kerneldoc warning (Logan Gunthorpe)
* tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: teach add_dma_entry() about DMA_ATTR_SKIP_CPU_SYNC
dma-debug: fix sg checks in debug_dma_map_sg()
dma-mapping: fix the kerneldoc for dma_map_sgtable()
Emeel Hakim [Mon, 18 Oct 2021 12:31:19 +0000 (15:31 +0300)]
net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags
Current Work Queue Entry (WQE) checksum (csum) flags in the ethernet
segment (eseg) in case of IPsec crypto offload datapath are not aligned
with PRM/HW expectations.
Currently the driver always sets the l3_inner_csum flag in case of IPsec
because of the wrong usage of skb->encapsulation as indicator for inner
IPsec header since skb->encapsulation is always ON for IPsec packets
since IPsec itself is an encapsulation protocol. The above forced a
failing attempts of calculating csum of non-existing segments (like in
the IP|ESP|TCP packet case which does not have an l3_inner) which led
to lots of packet drops hence the low throughput.
Fix by using xo->inner_ipproto as indicator for inner IPsec header
instead of skb->encapsulation in addition to setting the csum flags
as following:
* Tunnel Mode:
* Pkt: MAC IP ESP IP L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs
*
* Transport Mode:
* Pkt: MAC IP ESP L4
* CSUM: l3_cs [ | l4_cs (checksum partial case)]
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* Pkt: MAC IP ESP UDP VXLAN IP L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs
Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Emeel Hakim [Mon, 18 Oct 2021 12:30:09 +0000 (15:30 +0300)]
net/mlx5e: IPsec: Fix a misuse of the software parser's fields
IPsec crypto offload current Software Parser (SWP) fields settings in
the ethernet segment (eseg) are not aligned with PRM/HW expectations.
Among others in case of IP|ESP|TCP packet, current driver sets the
offsets for inner_l3 and inner_l4 although there is no inner l3/l4
headers relative to ESP header in such packets.
SWP provides the offsets for HW ,so it can be used to find csum fields
to offload the checksum, however these are not necessarily used by HW
and are used as fallback in case HW fails to parse the packet, e.g
when performing IPSec Transport Aware (IP | ESP | TCP) there is no
need to add SW parse on inner packet. So in some cases packets csum
was calculated correctly , whereas in other cases it failed. The later
faced csum errors (caused by wrong packet length calculations) which
led to lots of packet drops hence the low throughput.
Fix by setting the SWP fields as expected in a IP|ESP|TCP packet.
the following describe the expected SWP offsets:
* Tunnel Mode:
* SWP: OutL3 InL3 InL4
* Pkt: MAC IP ESP IP L4
*
* Transport Mode:
* SWP: OutL3 OutL4
* Pkt: MAC IP ESP L4
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* SWP: OutL3 InL3 InL4
* Pkt: MAC IP ESP UDP VXLAN IP L4
Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>