Marc Zyngier [Tue, 17 May 2022 09:37:06 +0000 (10:37 +0100)]
Merge branch irq/gic-v3-nmi-fixes-5.19 into irq/irqchip-next
* irq/gic-v3-nmi-fixes-5.19:
: .
: GICv3 pseudo-NMI fixes from Mark Rutland:
:
: "These patches fix a couple of issues with the way GICv3 pseudo-NMIs are
: handled:
:
: * The first patch adds a barrier we missed from NMI handling due to an
: oversight.
:
: * The second patch refactors some logic around reads from ICC_IAR1_EL1
: and adds commentary to explain what's going on.
:
: * The third patch descends into madness, reworking gic_handle_irq() to
: consistently manage ICC_PMR_EL1 + DAIF and avoid cases where these can
: be left in an inconsistent state while softirqs are processed."
: .
irqchip/gic-v3: Fix priority mask handling
irqchip/gic-v3: Refactor ISB + EOIR at ack time
irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 17 May 2022 09:36:56 +0000 (10:36 +0100)]
Merge branch irq/misc-5.19 into irq/irqchip-next
* irq/misc-5.19:
: .
: Misc fixes and minor improvements:
:
: - GIC: Improve warning when the firmware tables are inconsistent
:
: - csky: Use true/false as boolean litterals
:
: - imx-irqsteer: Add runtime PM support
:
: - armada-370-xp: Enable CPU affinity for MSIs, avoid messing with
: PMU interrupts on some variants
:
: - aspeed: Fix handling of irq_of_parse_and_map() errors
:
: - sun6i: Fix sparse warnings
:
: - xtensa-mx: Fix initial IRQ affinity in non-SMP setup
:
: - exiu: Fix acknowledgment of edge-triggered interrupts
:
: - sunxi: Generalise configuration for further reuse
: .
irqchip: Add Kconfig symbols for sunxi drivers
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
irqchip/gic: Improved warning about incorrect type
irqchip/csky: Return true/false (not 1/0) from bool functions
irqchip/imx-irqsteer: Add runtime PM support
irqchip/imx-irqsteer: Constify irq_chip struct
irqchip/armada-370-xp: Enable MSI affinity configuration
irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
irqchip/sun6i-r: Use NULL for chip_data
irqchip/xtensa-mx: Fix initial IRQ affinity in non-SMP setup
irqchip/exiu: Fix acknowledgment of edge triggered interrupts
Signed-off-by: Marc Zyngier <maz@kernel.org>
Samuel Holland [Mon, 9 May 2022 03:49:41 +0000 (22:49 -0500)]
irqchip: Add Kconfig symbols for sunxi drivers
Not all of these drivers are needed on every ARCH_SUNXI platform. In
particular, the ARCH_SUNXI symbol will be reused for the Allwinner D1,
a RISC-V SoC which contains none of these irqchips.
Introduce Kconfig symbols so we can select only the drivers actually
used by a particular set of platforms. This also lets us move the
irqchip driver dependencies to a more appropriate location.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220509034941.30704-1-samuel@sholland.org
Mark Rutland [Fri, 13 May 2022 13:30:38 +0000 (14:30 +0100)]
irqchip/gic-v3: Fix priority mask handling
When a kernel is built with CONFIG_ARM64_PSEUDO_NMI=y and pseudo-NMIs
are enabled at runtime, GICv3's gic_handle_irq() can leave DAIF and
ICC_PMR_EL1 in an unexpected state in some cases, breaking subsequent
usage of local_irq_enable() and resulting in softirqs being run with
IRQs erroneously masked (possibly resulting in deadlocks).
This can happen when an IRQ exception is taken from a context where
regular IRQs were unmasked, and either:
(1) ICC_IAR1_EL1 indicates a special INTID (e.g. as a result of an IRQ
being withdrawn since the IRQ exception was taken).
(2) ICC_IAR1_EL1 and ICC_RPR_EL1 indicate an NMI was acknowledged.
When an NMI is taken from a context where regular IRQs were masked,
there is no problem.
When CONFIG_ARM64_DEBUG_PRIORITY_MASKING=y, this can be detected with
perf, e.g.
| # ./perf record -a -g -e cycles:k ls -alR / > /dev/null 2>&1
| ------------[ cut here ]------------
| WARNING: CPU: 0 PID: 14 at arch/arm64/include/asm/irqflags.h:32 arch_local_irq_enable+0x4c/0x6c
| Modules linked in:
| CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted
5.18.0-rc5-00004-g876c38e3d20b #12
| Hardware name: linux,dummy-virt (DT)
| pstate:
204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : arch_local_irq_enable+0x4c/0x6c
| lr : __do_softirq+0x110/0x5d8
| sp :
ffff8000080bbbc0
| pmr_save:
000000f0
| x29:
ffff8000080bbbc0 x28:
ffff316ac3a6ca40 x27:
0000000000000000
| x26:
0000000000000000 x25:
ffffa04611c06008 x24:
ffffa04611c06008
| x23:
0000000040400005 x22:
0000000000000200 x21:
ffff8000080bbe20
| x20:
ffffa0460fe10320 x19:
0000000000000009 x18:
0000000000000000
| x17:
ffff91252dfa9000 x16:
ffff800008004000 x15:
0000000000004000
| x14:
0000000000000028 x13:
ffffa0460fe17578 x12:
ffffa0460fed4294
| x11:
ffffa0460fedc168 x10:
ffffffffffffff80 x9 :
ffffa0460fe10a70
| x8 :
ffffa0460fedc168 x7 :
000000000000b762 x6 :
00000000057c3bdf
| x5 :
ffff8000080bbb18 x4 :
0000000000000000 x3 :
0000000000000001
| x2 :
ffff91252dfa9000 x1 :
0000000000000060 x0 :
00000000000000f0
| Call trace:
| arch_local_irq_enable+0x4c/0x6c
| __irq_exit_rcu+0x180/0x1ac
| irq_exit_rcu+0x1c/0x44
| el1_interrupt+0x4c/0xe4
| el1h_64_irq_handler+0x18/0x24
| el1h_64_irq+0x74/0x78
| smpboot_thread_fn+0x68/0x2c0
| kthread+0x124/0x130
| ret_from_fork+0x10/0x20
| irq event stamp: 193241
| hardirqs last enabled at (193240): [<
ffffa0460fe10a9c>] __do_softirq+0x10c/0x5d8
| hardirqs last disabled at (193241): [<
ffffa0461102ffe4>] el1_dbg+0x24/0x90
| softirqs last enabled at (193234): [<
ffffa0460fe10e00>] __do_softirq+0x470/0x5d8
| softirqs last disabled at (193239): [<
ffffa0460fea9944>] __irq_exit_rcu+0x180/0x1ac
| ---[ end trace
0000000000000000 ]---
The necessary manipulation of DAIF and ICC_PMR_EL1 depends on the
interrupted context, but the structure of gic_handle_irq() makes this
also depend on whether the GIC reports an IRQ, NMI, or special INTID:
* When the interrupted context had regular IRQs masked (and hence the
interrupt must be an NMI), the entry code performs the NMI
entry/exit and gic_handle_irq() should return with DAIF and
ICC_PMR_EL1 unchanged.
This is handled correctly today.
* When the interrupted context had regular IRQs unmasked, the entry code
performs IRQ entry/exit, but expects gic_handle_irq() to always update
ICC_PMR_EL1 and DAIF.IF to unmask NMIs (but not regular IRQs) prior to
returning (which it must do prior to invoking any regular IRQ
handler).
This unbalanced calling convention is necessary because we don't know
whether an NMI has been taken until acknowledged by a read from
ICC_IAR1_EL1, and so we need to perform the read with NMI masked in
case an NMI has been taken (and needs to be handled with NMIs masked).
Unfortunately, this is not handled consistently:
- When ICC_IAR1_EL1 reports a special INTID, gic_handle_irq() returns
immediately without manipulating ICC_PMR_EL1 and DAIF.
- When RPR_EL1 indicates an NMI, gic_handle_irq() calls
gic_handle_nmi() to invoke the NMI handler, then returns without
manipulating ICC_PMR_EL1 and DAIF.
- For regular IRQs, gic_handle_irq() manipulates ICC_PMR_EL1 and DAIF
prior to invoking the IRQ handler.
There were related problems with special INTID handling in the past,
where if an exception was taken from a context with regular IRQs masked
and ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would
erroneously unmask NMIs in NMI context permitted an unexpected nested
NMI. That case specifically was fixed by commit:
a97709f563a078e2 ("irqchip/gic-v3: Do not enable irqs when handling spurious interrups")
... but unfortunately that commit added an inverse problem, where if an
exception was taken from a context with regular IRQs *unmasked* and
ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would erroneously
fail to unmask NMIs (and consequently regular IRQs could not be
unmasked during softirq processing). Before and after that commit, if an
NMI was taken from a context with regular IRQs unmasked gic_handle_irq()
would not unmask NMIs prior to returning, leading to the same problem
with softirq handling.
This patch fixes this by restructuring gic_handle_irq(), splitting it
into separate irqson/irqsoff helper functions which consistently perform
the DAIF + ICC_PMR1_EL1 manipulation based upon the interrupted context,
regardless of the event indicated by ICC_IAR1_EL1.
The special INTID handling is moved into the low-level IRQ/NMI handler
invocation helper functions, so that early returns don't prevent the
required manipulation of DAIF + ICC_PMR_EL1.
Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220513133038.226182-4-mark.rutland@arm.com
Mark Rutland [Fri, 13 May 2022 13:30:37 +0000 (14:30 +0100)]
irqchip/gic-v3: Refactor ISB + EOIR at ack time
There are cases where a context synchronization event is necessary
between an IRQ being raised and being handled, and there are races such
that we cannot rely upon the exception entry being subsequent to the
interrupt being raised. To fix this, we place an ISB between a read of
IAR and the subsequent invocation of an IRQ handler.
When EOI mode 1 is in use, we need to EOI an interrupt prior to invoking
its handler, and we have a write to EOIR for this. As this write to EOIR
requires an ISB, and this is provided by the gic_write_eoir() helper, we
omit the usual ISB in this case, with the logic being:
| if (static_branch_likely(&supports_deactivate_key))
| gic_write_eoir(irqnr);
| else
| isb();
This is somewhat opaque, and it would be a little clearer if there were
an unconditional ISB, with only the write to EOIR being conditional,
e.g.
| if (static_branch_likely(&supports_deactivate_key))
| write_gicreg(irqnr, ICC_EOIR1_EL1);
|
| isb();
This patch rewrites the code that way, with this logic factored into a
new helper function with comments explaining what the ISB is for, as
were originally laid out in commit:
39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq")
Note that since then, we removed the IAR polling in commit:
342677d70ab92142 ("irqchip/gic-v3: Remove acknowledge loop")
... which removed one of the two race conditions.
For consistency, other portions of the driver are made to manipulate
EOIR using write_gicreg() and explcit ISBs, and the gic_write_eoir()
helper function is removed.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220513133038.226182-3-mark.rutland@arm.com
Mark Rutland [Fri, 13 May 2022 13:30:36 +0000 (14:30 +0100)]
irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling
There are cases where a context synchronization event is necessary
between an IRQ being raised and being handled, and there are races such
that we cannot rely upon the exception entry being subsequent to the
interrupt being raised.
We identified and fixes this for regular IRQs in commit:
39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq")
Unfortunately, we forgot to do the same for psuedo-NMIs when support for
those was added in commit:
f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs")
Which means that when pseudo-NMIs are used for PMU support, we'll hit
the same problem.
Apply the same fix as for regular IRQs. Note that when EOI mode 1 is in
use, the call to gic_write_eoir() will provide an ISB.
Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220513133038.226182-2-mark.rutland@arm.com
Pali Rohár [Mon, 25 Apr 2022 11:37:05 +0000 (13:37 +0200)]
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
Register ARMADA_370_XP_INT_FABRIC_MASK_OFFS is Armada 370 and XP specific
and on new Armada platforms it has different meaning. It does not configure
Performance Counter Overflow interrupt masking. So do not touch this
register on non-A370/XP platforms (A375, A38x and A39x).
Signed-off-by: Pali Rohár <pali@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 28da06dfd9e4 ("irqchip: armada-370-xp: Enable the PMU interrupts")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425113706.29310-1-pali@kernel.org
Florian Fainelli [Tue, 8 Mar 2022 20:11:16 +0000 (12:11 -0800)]
irqchip/gic: Improved warning about incorrect type
Issue the warning for interrupt lines that have an incorrect interrupt
type and also print the hardware interrupt number to facilitate the
resolution of such problems.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220308201117.3870678-1-f.fainelli@gmail.com
Haowen Bai [Thu, 17 Mar 2022 03:21:24 +0000 (11:21 +0800)]
irqchip/csky: Return true/false (not 1/0) from bool functions
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1647487284-30088-1-git-send-email-baihaowen@meizu.com
Lucas Stach [Wed, 6 Apr 2022 16:37:01 +0000 (18:37 +0200)]
irqchip/imx-irqsteer: Add runtime PM support
There are now SoCs that integrate the irqsteer controller within
a separate power domain. In order to allow this domain to be
powered down when not needed, add runtime PM support to the driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220406163701.1277930-2-l.stach@pengutronix.de
Lucas Stach [Wed, 6 Apr 2022 16:37:00 +0000 (18:37 +0200)]
irqchip/imx-irqsteer: Constify irq_chip struct
The imx_irqsteer_irq_chip struct is constant data.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220406163701.1277930-1-l.stach@pengutronix.de
Nathan Rossi [Fri, 22 Apr 2022 04:35:32 +0000 (04:35 +0000)]
irqchip/armada-370-xp: Enable MSI affinity configuration
With multiple devices attached via PCIe to an Armada 385 it is possible
to overwhelm a single CPU with MSI interrupts. Under certain scenarios
configuring the interrupts to be handled by more than one CPU would
prevent the system from being overwhelmed. However the
irqchip-aramada-370-xp driver is configured to only handle MSIs on the
boot CPU, and provides no affinity configuration.
This change adds support to the armada-370-xp driver to allow for
configuring the affinity of specific MSI irqs and to generate the
interrupts on secondary CPUs. This is done by enabling the private
doorbell for all online CPUs and configures all CPUs to unmask MSI
specific private doorbell bits. The CPU affinity selection of the
interrupt is handled by the target list of the software triggered
interrupt value, which is provided as the MSI message. The message has
the associated CPU bit set for the target CPU. For private doorbell
interrupts only one bit can be set otherwise all CPUs will receive the
interrupt, so the lowest CPU in the affinity mask is used. This means
that by default the first CPU will handle all the interrupts as was the
case before.
Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220422043532.146946-1-nathan@nathanrossi.com
Krzysztof Kozlowski [Sat, 23 Apr 2022 09:42:27 +0000 (11:42 +0200)]
irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value
The irq_of_parse_and_map() returns 0 on failure, not a negative ERRNO.
Fixes: 04f605906ff0 ("irqchip: Add Aspeed SCU interrupt controller")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220423094227.33148-2-krzysztof.kozlowski@linaro.org
Krzysztof Kozlowski [Sat, 23 Apr 2022 09:42:26 +0000 (11:42 +0200)]
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
The irq_of_parse_and_map() returns 0 on failure, not a negative ERRNO.
Fixes: f48e699ddf70 ("irqchip/aspeed-i2c-ic: Add I2C IRQ controller for Aspeed")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220423094227.33148-1-krzysztof.kozlowski@linaro.org
Samuel Holland [Sun, 24 Apr 2022 17:39:51 +0000 (12:39 -0500)]
irqchip/sun6i-r: Use NULL for chip_data
sparse complains about using an integer as a NULL pointer.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220424173952.36591-1-samuel@sholland.org
Max Filippov [Tue, 26 Apr 2022 16:19:12 +0000 (09:19 -0700)]
irqchip/xtensa-mx: Fix initial IRQ affinity in non-SMP setup
When irq-xtensa-mx chip is used in non-SMP configuration its
irq_set_affinity callback is not called leaving IRQ affinity set empty.
As a result IRQ delivery does not work in that configuration.
Initialize IRQ affinity of the xtensa MX interrupt distributor to CPU 0
for all external IRQ lines.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220426161912.1113784-1-jcmvbkbc@gmail.com
Daniel Thompson [Tue, 3 May 2022 13:45:41 +0000 (14:45 +0100)]
irqchip/exiu: Fix acknowledgment of edge triggered interrupts
Currently the EXIU uses the fasteoi interrupt flow that is configured by
it's parent (irq-gic-v3.c). With this flow the only chance to clear the
interrupt request happens during .irq_eoi() and (obviously) this happens
after the interrupt handler has run. EXIU requires edge triggered
interrupts to be acked prior to interrupt handling. Without this we
risk incorrect interrupt dismissal when a new interrupt is delivered
after the handler reads and acknowledges the peripheral but before the
irq_eoi() takes place.
Fix this by clearing the interrupt request from .irq_ack() if we are
configured for edge triggered interrupts. This requires adopting the
fasteoi-ack flow instead of the fasteoi to ensure the ack gets called.
These changes have been tested using the power button on a
Developerbox/SC2A11 combined with some hackery in gpio-keys so I can
play with the different trigger mode [and an mdelay(500) so I can
can check what happens on a double click in both modes].
Fixes: 706cffc1b912 ("irqchip/exiu: Add support for Socionext Synquacer EXIU controller")
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503134541.2566457-1-daniel.thompson@linaro.org
Marc Zyngier [Wed, 4 May 2022 15:09:47 +0000 (16:09 +0100)]
Merge branch irq/gic-v3-5.19 into irq/irqchip-next
* irq/gic-v3-5.19:
: .
: Misc improvements for GICv3:
:
: - Minimise the number of cases where we need to poll RWP
:
: - Allow the use of MMIO-based invalidation for LPIs
:
: - Track GICD/GICR mappings in /proc/iomem
:
: - Tighten the GICv3 DT binding to avoid endless discussions
: on the list...
: .
irqchip/gic-v3: Claim iomem resources
dt-bindings: interrupt-controller: arm,gic-v3: Make the v2 compat requirements explicit
irqchip/gic-v3: Relax polling of GIC{R,D}_CTLR.RWP
irqchip/gic-v3: Detect LPI invalidation MMIO registers
irqchip/gic-v3: Exposes bit values for GICR_CTLR.{IR, CES}
Signed-off-by: Marc Zyngier <maz@kernel.org>
Robin Murphy [Tue, 12 Apr 2022 15:28:15 +0000 (16:28 +0100)]
irqchip/gic-v3: Claim iomem resources
As a simple quality-of-life tweak, claim our MMIO regions when mapping
them, such that the GIC shows up in /proc/iomem. No effort is spent on
trying to release them, since frankly if the GIC fails to probe then
it's never getting a second try anyway.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/c534c2a458a3bf94ccdae8abc6edc3d45a689c30.1649777295.git.robin.murphy@arm.com
Marc Zyngier [Sat, 9 Apr 2022 10:16:17 +0000 (11:16 +0100)]
dt-bindings: interrupt-controller: arm,gic-v3: Make the v2 compat requirements explicit
A common mistake when writing a device tree for a platform that is using
GICv3 with ancient CPUs is to overlook the MMIO frames that implement
the GICv2 compatibility feature, because this feature is implemented by
the CPUs and not by the GIC itself.
The compatibility feature itself is optional (all the modern
implementations have dropped it), but is present in all the ARM Ltd
implementations of the ARMv8.0 architecture (A3x, A53, A57, A72, A73),
and many others from various implementers.
Make it explicit that GICC, GICH and GICV are required for these CPUs.
Also take this opportunity to update my email address, as people keep
sending them to the wrong place...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Krzysztof Kozlowski <krzk+dt@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220409101617.268796-1-maz@kernel.org
Marc Zyngier [Tue, 5 Apr 2022 18:38:57 +0000 (19:38 +0100)]
irqchip/gic-v3: Relax polling of GIC{R,D}_CTLR.RWP
Recent work on the KVM GIC emulation has revealed that the GICv3
driver is a bit RWP-happy, as it polls this bit for each and
every write MMIO access involving a single interrupt.
As it turns out, polling RWP is only required when:
- Disabling an SGI, PPI or SPI
- Disabling LPIs at the redistributor level
- Disabling groups
- Enabling ARE
- Dealing with DPG*
Simplify the driver by removing all the other instances of RWP
polling, and add the one that was missing when enabling the distributor
(as that's where we set ARE).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220405183857.205960-4-maz@kernel.org
Marc Zyngier [Tue, 5 Apr 2022 18:38:56 +0000 (19:38 +0100)]
irqchip/gic-v3: Detect LPI invalidation MMIO registers
Since GICv4.1, an implementation can offer the same MMIO-based
implementation as DirectLPI, only with an ITS. Given that this
can be hugely beneficial for workloads that are very LPI masking
heavy (although these workloads are admitedly a bit odd).
Interestingly, this is independent of RVPEI, which only *implies*
the functionnality.
So let's detect whether the implementation has GICR_CTLR.IR set,
and propagate this as DirectLPI to the ITS driver.
While we're at it, repaint the GICv3 banner so that we advertise
the various capabilities at boot time to be slightly less invasive.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220405183857.205960-3-maz@kernel.org
Marc Zyngier [Tue, 5 Apr 2022 18:23:24 +0000 (19:23 +0100)]
irqchip/gic-v3: Exposes bit values for GICR_CTLR.{IR, CES}
As we're about to expose GICR_CTLR.{IR,CES} to guests, populate
the include file with the architectural values.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oupton@google.com>
Link: https://lore.kernel.org/r/20220405182327.205520-2-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:23:14 +0000 (15:23 +0100)]
Merge branch irq/gpio-immutable into irq/irqchip-next
* irq/gpio-immutable:
: .
: First try at preventing the GPIO subsystem from abusing irq_chip
: data structures. The general idea is to have an irq_chip flag
: to tell the GPIO subsystem that these structures are immutable,
: and to convert drivers one by one.
: .
Documentation: Update the recommended pattern for GPIO irqchips
gpio: Update TODO to mention immutable irq_chip structures
pinctrl: amd: Make the irqchip immutable
pinctrl: msmgpio: Make the irqchip immutable
pinctrl: apple-gpio: Make the irqchip immutable
gpio: pl061: Make the irqchip immutable
gpio: tegra186: Make the irqchip immutable
gpio: Add helpers to ease the transition towards immutable irq_chip
gpio: Expose the gpiochip_irq_re[ql]res helpers
gpio: Don't fiddle with irqchips marked as immutable
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 19 Apr 2022 14:18:46 +0000 (15:18 +0100)]
Documentation: Update the recommended pattern for GPIO irqchips
Update the documentation to get rid of the per-gpio_irq_chip
irq_chip structure, and give examples of the new pattern.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-11-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:45 +0000 (15:18 +0100)]
gpio: Update TODO to mention immutable irq_chip structures
5 drivers are converted, a few hundred to go. Definitely worth of
a TODO entry, in the hope that someone will notice it and do
a bulk update.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-10-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:44 +0000 (15:18 +0100)]
pinctrl: amd: Make the irqchip immutable
Prevent gpiolib from messing with the irqchip by advertising
the irq_chip structure as immutable, making it const, and adding
the various calls that gpiolib relies upon.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-9-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:43 +0000 (15:18 +0100)]
pinctrl: msmgpio: Make the irqchip immutable
Prevent gpiolib from messing with the irqchip by advertising
the irq_chip structure as immutable, making it const, and adding
the various calls that gpiolib relies upon.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-8-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:42 +0000 (15:18 +0100)]
pinctrl: apple-gpio: Make the irqchip immutable
Prevent gpiolib from messing with the irqchip by advertising
the irq_chip structure as immutable, making it const, and adding
the various calls that gpiolib relies upon.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-7-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:41 +0000 (15:18 +0100)]
gpio: pl061: Make the irqchip immutable
Prevent gpiolib from messing with the irqchip by advertising
the irq_chip structure as immutable, making it const, and adding
the various calls that gpiolib relies upon.
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-6-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:40 +0000 (15:18 +0100)]
gpio: tegra186: Make the irqchip immutable
Prevent gpiolib from messing with the irqchip by advertising
the irq_chip structure as immutable, making it const, and adding
the various calls that gpiolib relies upon.
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-5-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:39 +0000 (15:18 +0100)]
gpio: Add helpers to ease the transition towards immutable irq_chip
Add a couple of new helpers to make it slightly simpler to convert
drivers to immutable irq_chip structures:
- GPIOCHIP_IRQ_RESOURCE_HELPERS populates the irq_chip structure
with the resource management callbacks
- gpio_irq_chip_set_chip() populates the gpio_irq_chip.chip
structure, avoiding the proliferation of ugly casts
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-4-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:38 +0000 (15:18 +0100)]
gpio: Expose the gpiochip_irq_re[ql]res helpers
The GPIO subsystem has a couple of internal helpers to manage
resources on behalf of the irqchip. Expose them so that GPIO
drivers can use them directly.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-3-maz@kernel.org
Marc Zyngier [Tue, 19 Apr 2022 14:18:37 +0000 (15:18 +0100)]
gpio: Don't fiddle with irqchips marked as immutable
In order to move away from gpiolib messing with the internals of
unsuspecting irqchips, add a flag by which irqchips advertise
that they are not to be messed with, and do solemnly swear that
they correctly call into the gpiolib helpers when required.
Also nudge the users into converting their drivers to the
new model.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-2-maz@kernel.org
Linus Torvalds [Sun, 17 Apr 2022 20:57:31 +0000 (13:57 -0700)]
Linux 5.18-rc3
Linus Torvalds [Sun, 17 Apr 2022 17:29:10 +0000 (10:29 -0700)]
Merge tag 'for-linus-5.18-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixlet from Juergen Gross:
"A single cleanup patch for the Xen balloon driver"
* tag 'for-linus-5.18-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: don't use PV mode extra memory for zone device allocations
Linus Torvalds [Sun, 17 Apr 2022 16:55:59 +0000 (09:55 -0700)]
Merge tag 'x86-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two x86 fixes related to TSX:
- Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX
to cover all CPUs which allow to disable it.
- Disable TSX development mode at boot so that a microcode update
which provides TSX development mode does not suddenly make the
system vulnerable to TSX Asynchronous Abort"
* tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsx: Disable TSX development mode at boot
x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Linus Torvalds [Sun, 17 Apr 2022 16:53:01 +0000 (09:53 -0700)]
Merge tag 'timers-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A small set of fixes for the timers core:
- Fix the warning condition in __run_timers() which does not take
into account that a CPU base (especially the deferrable base) never
has a timer armed on it and therefore the next_expiry value can
become stale.
- Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to
prevent endless spam in dmesg.
- Remove the double star from a comment which is not meant to be in
kernel-doc format"
* tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/sched: Fix non-kernel-doc comment
tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
timers: Fix warning condition in __run_timers()
Linus Torvalds [Sun, 17 Apr 2022 16:46:15 +0000 (09:46 -0700)]
Merge tag 'smp-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull SMP fixes from Thomas Gleixner:
"Two fixes for the SMP core:
- Make the warning condition in flush_smp_call_function_queue()
correct, which checked a just emptied list head for being empty
instead of validating that there was no pending entry on the
offlined CPU at all.
- The @cpu member of struct cpuhp_cpu_state is initialized when the
CPU hotplug thread for the upcoming CPU is created. That's too late
because the creation of the thread can fail and then the following
rollback operates on CPU0. Get rid of the CPU member and hand the
CPU number to the involved functions directly"
* tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
smp: Fix offline cpu check in flush_smp_call_function_queue()
Linus Torvalds [Sun, 17 Apr 2022 16:42:03 +0000 (09:42 -0700)]
Merge tag 'irq-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for the interrupt affinity spreading logic to take into
account that there can be an imbalance between present and possible
CPUs, which causes already assigned bits to be overwritten"
* tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/affinity: Consider that CPUs on nodes can be unbalanced
Linus Torvalds [Sun, 17 Apr 2022 16:36:27 +0000 (09:36 -0700)]
Merge tag 'for-v5.18-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
- Fix a regression with battery data failing to load from DT
* tag 'for-v5.18-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: Reset err after not finding static battery
power: supply: samsung-sdi-battery: Add missing charge restart voltages
Linus Torvalds [Sun, 17 Apr 2022 16:31:27 +0000 (09:31 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Regular set of fixes for drivers and the dev-interface"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ismt: Fix undefined behavior due to shift overflowing the constant
i2c: dev: Force case user pointers in compat_i2cdev_ioctl()
i2c: dev: check return value when calling dev_set_name()
i2c: qcom-geni: Use dev_err_probe() for GPI DMA error
i2c: imx: Implement errata ERR007805 or e7805 bus frequency limit
i2c: pasemi: Wait for write xfers to finish
Linus Torvalds [Sun, 17 Apr 2022 00:07:50 +0000 (17:07 -0700)]
Merge tag 'devicetree-fixes-for-5.18-2' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix scalar property schemas with array constraints
- Fix 'enum' lists with duplicate entries
- Fix incomplete if/then/else schemas
- Add Renesas RZ/V2L SoC support to Mali Bifrost binding
- Maintainers update for Marvell irqchip
* tag 'devicetree-fixes-for-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: display: panel-timing: Define a single type for properties
dt-bindings: Fix array constraints on scalar properties
dt-bindings: gpu: mali-bifrost: Document RZ/V2L SoC
dt-bindings: net: snps: remove duplicate name
dt-bindings: Fix 'enum' lists with duplicate entries
dt-bindings: irqchip: mrvl,intc: refresh maintainers
dt-bindings: Fix incomplete if/then/else schemas
dt-bindings: power: renesas,apmu: Fix cpus property limits
dt-bindings: extcon: maxim,max77843: fix ports type
Linus Torvalds [Sun, 17 Apr 2022 00:01:43 +0000 (17:01 -0700)]
Merge tag 'gpio-fixes-for-v5.18-rc3' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"A single fix for gpio-sim and two patches for GPIO ACPI pulled from
Andy:
- fix the set/get_multiple() callbacks in gpio-sim
- use correct format characters in gpiolib-acpi
- use an unsigned type for pins in gpiolib-acpi"
* tag 'gpio-fixes-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix setting and getting multiple lines
gpiolib: acpi: Convert type for pin to be unsigned
gpiolib: acpi: use correct format characters
Linus Torvalds [Sat, 16 Apr 2022 23:51:39 +0000 (16:51 -0700)]
Merge tag 'soc-fixes-5.18-2' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of SoC bugfixes that came in since the merge
window, and more of them are already pending.
This batch includes:
- A boot time regression fix for davinci that triggered on
multi_v5_defconfig when booting any platform
- Defconfig updates to address removed features, changed symbol names
or dependencies, for gemini, ux500, and pxa
- Email address changes for Krzysztof Kozlowski
- Build warning fixes for ep93xx and iop32x
- Devicetree warning fixes across many platforms
- Minor bugfixes for the reset controller, memory controller and SCMI
firmware subsystems plus the versatile-express board"
* tag 'soc-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (34 commits)
ARM: config: Update Gemini defconfig
arm64: dts: qcom/sdm845-shift-axolotl: Fix boolean properties with values
ARM: dts: align SPI NOR node name with dtschema
ARM: dts: Fix more boolean properties with values
arm/arm64: dts: qcom: Fix boolean properties with values
arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes
arm: dts: imx: Fix boolean properties with values
arm64: dts: tegra: Fix boolean properties with values
arm: dts: at91: Fix boolean properties with values
arm: configs: imote2: Drop defconfig as board support dropped.
ep93xx: clock: Don't use plain integer as NULL pointer
ep93xx: clock: Fix UAF in ep93xx_clk_register_gate()
ARM: vexpress/spc: Fix all the kernel-doc build warnings
ARM: vexpress/spc: Fix kernel-doc build warning for ve_spc_cpu_in_wfi
ARM: config: u8500: Re-enable AB8500 battery charging
ARM: config: u8500: Add some common hardware
memory: fsl_ifc: populate child nodes of buses and mfd devices
ARM: config: Refresh U8500 defconfig
firmware: arm_scmi: Fix sparse warnings in OPTEE transport driver
firmware: arm_scmi: Replace zero-length array with flexible-array member
...
Linus Torvalds [Sat, 16 Apr 2022 23:42:53 +0000 (16:42 -0700)]
Merge tag 'random-5.18-rc3-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- Per your suggestion, random reads now won't fail if there's a page
fault after some non-zero amount of data has been read, which makes
the behavior consistent with all other reads in the kernel.
- Rather than an inconsistent mix of random_get_entropy() returning an
unsigned long or a cycles_t, now it just returns an unsigned long.
- A memcpy() was replaced with an memmove(), because the addresses are
sometimes overlapping. In practice the destination is always before
the source, so not really an issue, but better to be correct than
not.
* tag 'random-5.18-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: use memmove instead of memcpy for remaining 32 bytes
random: make random_get_entropy() return an unsigned long
random: allow partial reads if later user copies fail
Linus Torvalds [Sat, 16 Apr 2022 20:38:26 +0000 (13:38 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"13 fixes, all in drivers.
The most extensive changes are in the iscsi series (affecting drivers
qedi, cxgbi and bnx2i), the next most is scsi_debug, but that's just a
simple revert and then minor updates to pm80xx"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: MAINTAINERS: Add Mike Christie as co-maintainer
scsi: qedi: Fix failed disconnect handling
scsi: iscsi: Fix NOP handling during conn recovery
scsi: iscsi: Merge suspend fields
scsi: iscsi: Fix unbound endpoint error handling
scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
scsi: iscsi: Fix endpoint reuse regression
scsi: iscsi: Release endpoint ID when its freed
scsi: iscsi: Fix offload conn cleanup when iscsid restarts
scsi: iscsi: Move iscsi_ep_disconnect()
scsi: pm80xx: Enable upper inbound, outbound queues
scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63
Revert "scsi: scsi_debug: Address races following module load"
Bartosz Golaszewski [Sat, 16 Apr 2022 19:57:00 +0000 (21:57 +0200)]
Merge tag 'intel-gpio-v5.18-2' of gitolite.pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current
intel-gpio for v5.18-2
* Couple of fixes related to handling unsigned value of the pin from ACPI
gpiolib:
- acpi: Convert type for pin to be unsigned
- acpi: use correct format characters
Linus Torvalds [Sat, 16 Apr 2022 18:20:21 +0000 (11:20 -0700)]
Merge tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- avoid a double memory copy for swiotlb (Chao Gao)
* tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: avoid redundant memory sync for swiotlb
Jason A. Donenfeld [Wed, 13 Apr 2022 23:50:38 +0000 (01:50 +0200)]
random: use memmove instead of memcpy for remaining 32 bytes
In order to immediately overwrite the old key on the stack, before
servicing a userspace request for bytes, we use the remaining 32 bytes
of block 0 as the key. This means moving indices 8,9,a,b,c,d,e,f ->
4,5,6,7,8,9,a,b. Since 4 < 8, for the kernel implementations of
memcpy(), this doesn't actually appear to be a problem in practice. But
relying on that characteristic seems a bit brittle. So let's change that
to a proper memmove(), which is the by-the-books way of handling
overlapping memory copies.
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Linus Torvalds [Fri, 15 Apr 2022 22:57:18 +0000 (15:57 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: MAINTAINERS, binfmt, and
mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction,
hugetlb, vmalloc, and kmemleak)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
hugetlb: do not demote poisoned hugetlb pages
mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
mm: fix unexpected zeroed page mapping with zram swap
mm, page_alloc: fix build_zonerefs_node()
mm, kfence: support kmem_dump_obj() for KFENCE objects
kasan: fix hw tags enablement when KUNIT tests are disabled
irq_work: use kasan_record_aux_stack_noalloc() record callstack
mm/secretmem: fix panic when growing a memfd_secret
tmpfs: fix regressions from wider use of ZERO_PAGE
MAINTAINERS: Broadcom internal lists aren't maintainers
Linus Torvalds [Fri, 15 Apr 2022 22:20:59 +0000 (15:20 -0700)]
Merge tag 'for-5.18/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix memory corruption in DM integrity target when tag_size is less
than digest size.
- Fix DM multipath's historical-service-time path selector to not use
sched_clock() and ktime_get_ns(); only use ktime_get_ns().
- Fix dm_io->orig_bio NULL pointer dereference in dm_zone_map_bio() due
to 5.18 changes that overlooked DM zone's use of ->orig_bio
- Fix for regression that broke the use of dm_accept_partial_bio() for
"abnormal" IO (e.g. WRITE ZEROES) that does not need duplicate bios
- Fix DM's issuing of empty flush bio so that it's size is 0.
* tag 'for-5.18/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix bio length of empty flush
dm: allow dm_accept_partial_bio() for dm_io without duplicate bios
dm zone: fix NULL pointer dereference in dm_zone_map_bio
dm mpath: only use ktime_get_ns() in historical selector
dm integrity: fix memory corruption when tag_size is less than digest size
Patrick Wang [Fri, 15 Apr 2022 02:14:04 +0000 (19:14 -0700)]
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
The kmemleak_*_phys() apis do not check the address for lowmem's min
boundary, while the caller may pass an address below lowmem, which will
trigger an oops:
# echo scan > /sys/kernel/debug/kmemleak
Unable to handle kernel paging request at virtual address
ff5fffffffe00000
Oops [#1]
Modules linked in:
CPU: 2 PID: 134 Comm: bash Not tainted 5.18.0-rc1-next-
20220407 #33
Hardware name: riscv-virtio,qemu (DT)
epc : scan_block+0x74/0x15c
ra : scan_block+0x72/0x15c
epc :
ffffffff801e5806 ra :
ffffffff801e5804 sp :
ff200000104abc30
gp :
ffffffff815cd4e8 tp :
ff60000004cfa340 t0 :
0000000000000200
t1 :
00aaaaaac23954cc t2 :
00000000000003ff s0 :
ff200000104abc90
s1 :
ffffffff81b0ff28 a0 :
0000000000000000 a1 :
ff5fffffffe01000
a2 :
ffffffff81b0ff28 a3 :
0000000000000002 a4 :
0000000000000001
a5 :
0000000000000000 a6 :
ff200000104abd7c a7 :
0000000000000005
s2 :
ff5fffffffe00ff9 s3 :
ffffffff815cd998 s4 :
ffffffff815d0e90
s5 :
ffffffff81b0ff28 s6 :
0000000000000020 s7 :
ffffffff815d0eb0
s8 :
ffffffffffffffff s9 :
ff5fffffffe00000 s10:
ff5fffffffe01000
s11:
0000000000000022 t3 :
00ffffffaa17db4c t4 :
000000000000000f
t5 :
0000000000000001 t6 :
0000000000000000
status:
0000000000000100 badaddr:
ff5fffffffe00000 cause:
000000000000000d
scan_gray_list+0x12e/0x1a6
kmemleak_scan+0x2aa/0x57e
kmemleak_write+0x32a/0x40c
full_proxy_write+0x56/0x82
vfs_write+0xa6/0x2a6
ksys_write+0x6c/0xe2
sys_write+0x22/0x2a
ret_from_syscall+0x0/0x2
The callers may not quite know the actual address they pass(e.g. from
devicetree). So the kmemleak_*_phys() apis should guarantee the address
they finally use is in lowmem range, so check the address for lowmem's
min boundary.
Link: https://lkml.kernel.org/r/20220413122925.33856-1-patrick.wang.shcn@gmail.com
Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Omar Sandoval [Fri, 15 Apr 2022 02:14:01 +0000 (19:14 -0700)]
mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Commit
3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of
vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
purge the vmap areas instead of doing it lazily.
Commit
690467c81b1a ("mm/vmalloc: Move draining areas out of caller
context") moved the purging from the vunmap() caller to a worker thread.
Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
(possibly forever). For example, consider the following scenario:
1. Thread reads from /proc/vmcore. This eventually calls
__copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
vmap_lazy_nr to lazy_max_pages() + 1.
2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
pages (one page plus the guard page) to the purge list and
vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
drain_vmap_work is scheduled.
3. Thread returns from the kernel and is scheduled out.
4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
frees the 2 pages on the purge list. vmap_lazy_nr is now
lazy_max_pages() + 1.
5. This is still over the threshold, so it tries to purge areas again,
but doesn't find anything.
6. Repeat 5.
If the system is running with only one CPU (which is typicial for kdump)
and preemption is disabled, then this will never make forward progress:
there aren't any more pages to purge, so it hangs. If there is more
than one CPU or preemption is enabled, then the worker thread will spin
forever in the background. (Note that if there were already pages to be
purged at the time that set_iounmap_nonlazy() was called, this bug is
avoided.)
This can be reproduced with anything that reads from /proc/vmcore
multiple times. E.g., vmcore-dmesg /proc/vmcore.
It turns out that improvements to vmap() over the years have obsoleted
the need for this "optimization". I benchmarked `dd if=/proc/vmcore
of=/dev/null` with 4k and 1M read sizes on a system with a 32GB vmcore.
The test was run on 5.17, 5.18-rc1 with a fix that avoided the hang, and
5.18-rc1 with set_iounmap_nonlazy() removed entirely:
|5.17 |5.18+fix|5.18+removal
4k|40.86s| 40.09s| 26.73s
1M|24.47s| 23.98s| 21.84s
The removal was the fastest (by a wide margin with 4k reads). This
patch removes set_iounmap_nonlazy().
Link: https://lkml.kernel.org/r/52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com
Fixes: 690467c81b1a ("mm/vmalloc: Move draining areas out of caller context")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 15 Apr 2022 02:13:58 +0000 (19:13 -0700)]
revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
Despite Mike's attempted fix (
925346c129da117122), regressions reports
continue:
https://lore.kernel.org/lkml/
cb5b81bd-9882-e5dc-cd22-
54bdbaaefbbc@leemhuis.info/
https://bugzilla.kernel.org/show_bug.cgi?id=215720
https://lkml.kernel.org/r/
b685f3d0-da34-531d-1aa9-
479accd3e21b@leemhuis.info
So revert this patch.
Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 15 Apr 2022 02:13:55 +0000 (19:13 -0700)]
revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
Commit
925346c129da11 ("fs/binfmt_elf: fix PT_LOAD p_align values for
loaders") was an attempt to fix regressions due to
9630f0d60fec5f
("fs/binfmt_elf: use PT_LOAD p_align values for static PIE").
But regressionss continue to be reported:
https://lore.kernel.org/lkml/
cb5b81bd-9882-e5dc-cd22-
54bdbaaefbbc@leemhuis.info/
https://bugzilla.kernel.org/show_bug.cgi?id=215720
https://lkml.kernel.org/r/
b685f3d0-da34-531d-1aa9-
479accd3e21b@leemhuis.info
This patch reverts the fix, so the original can also be reverted.
Fixes: 925346c129da11 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders")
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Fri, 15 Apr 2022 02:13:52 +0000 (19:13 -0700)]
hugetlb: do not demote poisoned hugetlb pages
It is possible for poisoned hugetlb pages to reside on the free lists.
The huge page allocation routines which dequeue entries from the free
lists make a point of avoiding poisoned pages. There is no such check
and avoidance in the demote code path.
If a hugetlb page on the is on a free list, poison will only be set in
the head page rather then the page with the actual error. If such a
page is demoted, then the poison flag may follow the wrong page. A page
without error could have poison set, and a page with poison could not
have the flag set.
Check for poison before attempting to demote a hugetlb page. Also,
return -EBUSY to the caller if only poisoned pages are on the free list.
Link: https://lkml.kernel.org/r/20220307215707.50916-1-mike.kravetz@oracle.com
Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Charan Teja Kalla [Fri, 15 Apr 2022 02:13:49 +0000 (19:13 -0700)]
mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
The below warning is reported when CONFIG_COMPACTION=n:
mm/compaction.c:56:27: warning: 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' defined but not used [-Wunused-const-variable=]
56 | static const unsigned int HPAGE_FRAG_CHECK_INTERVAL_MSEC = 500;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix it by moving 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' under
CONFIG_COMPACTION defconfig.
Also since this is just a 'static const int' type, use #define for it.
Link: https://lkml.kernel.org/r/1647608518-20924-1-git-send-email-quic_charante@quicinc.com
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Nitin Gupta <nigupta@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Fri, 15 Apr 2022 02:13:46 +0000 (19:13 -0700)]
mm: fix unexpected zeroed page mapping with zram swap
Two processes under CLONE_VM cloning, user process can be corrupted by
seeing zeroed page unexpectedly.
CPU A CPU B
do_swap_page do_swap_page
SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
swap_readpage valid data
swap_slot_free_notify
delete zram entry
swap_readpage zeroed(invalid) data
pte_lock
map the *zero data* to userspace
pte_unlock
pte_lock
if (!pte_same)
goto out_nomap;
pte_unlock
return and next refault will
read zeroed data
The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
increase the refcount of swap slot at copy_mm so it couldn't catch up
whether it's safe or not to discard data from backing device. In the
case, only the lock it could rely on to synchronize swap slot freeing is
page table lock. Thus, this patch gets rid of the swap_slot_free_notify
function. With this patch, CPU A will see correct data.
CPU A CPU B
do_swap_page do_swap_page
SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path
swap_readpage original data
pte_lock
map the original data
swap_free
swap_range_free
bd_disk->fops->swap_slot_free_notify
swap_readpage read zeroed data
pte_unlock
pte_lock
if (!pte_same)
goto out_nomap;
pte_unlock
return
on next refault will see mapped data by CPU B
The concern of the patch would increase memory consumption since it
could keep wasted memory with compressed form in zram as well as
uncompressed form in address space. However, most of cases of zram uses
no readahead and do_swap_page is followed by swap_free so it will free
the compressed form from in zram quickly.
Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Juergen Gross [Fri, 15 Apr 2022 02:13:43 +0000 (19:13 -0700)]
mm, page_alloc: fix build_zonerefs_node()
Since commit
6aa303defb74 ("mm, vmscan: only allocate and reclaim from
zones with pages managed by the buddy allocator") only zones with free
memory are included in a built zonelist. This is problematic when e.g.
all memory of a zone has been ballooned out when zonelists are being
rebuilt.
The decision whether to rebuild the zonelists when onlining new memory
is done based on populated_zone() returning 0 for the zone the memory
will be added to. The new zone is added to the zonelists only, if it
has free memory pages (managed_zone() returns a non-zero value) after
the memory has been onlined. This implies, that onlining memory will
always free the added pages to the allocator immediately, but this is
not true in all cases: when e.g. running as a Xen guest the onlined new
memory will be added only to the ballooned memory list, it will be freed
only when the guest is being ballooned up afterwards.
Another problem with using managed_zone() for the decision whether a
zone is being added to the zonelists is, that a zone with all memory
used will in fact be removed from all zonelists in case the zonelists
happen to be rebuilt.
Use populated_zone() when building a zonelist as it has been done before
that commit.
There was a report that QubesOS (based on Xen) is hitting this problem.
Xen has switched to use the zone device functionality in kernel 5.9 and
QubesOS wants to use memory hotplugging for guests in order to be able
to start a guest with minimal memory and expand it as needed. This was
the report leading to the patch.
Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com
Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marco Elver [Fri, 15 Apr 2022 02:13:40 +0000 (19:13 -0700)]
mm, kfence: support kmem_dump_obj() for KFENCE objects
Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been
producing garbage data due to the object not actually being maintained
by SLAB or SLUB.
Fix this by implementing __kfence_obj_info() that copies relevant
information to struct kmem_obj_info when the object was allocated by
KFENCE; this is called by a common kmem_obj_info(), which also calls the
slab/slub/slob specific variant now called __kmem_obj_info().
For completeness, kmem_dump_obj() now displays if the object was
allocated by KFENCE.
Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com
Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB")
Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vincenzo Frascino [Fri, 15 Apr 2022 02:13:37 +0000 (19:13 -0700)]
kasan: fix hw tags enablement when KUNIT tests are disabled
Kasan enables hw tags via kasan_enable_tagging() which based on the mode
passed via kernel command line selects the correct hw backend.
kasan_enable_tagging() is meant to be invoked indirectly via the cpu
features framework of the architectures that support these backends.
Currently the invocation of this function is guarded by
CONFIG_KASAN_KUNIT_TEST which allows the enablement of the correct backend
only when KUNIT tests are enabled in the kernel.
This inconsistency was introduced in commit:
ed6d74446cbf ("kasan: test: support async (again) and asymm modes for HW_TAGS")
... and prevents to enable MTE on arm64 when KUNIT tests for kasan hw_tags are
disabled.
Fix the issue making sure that the CONFIG_KASAN_KUNIT_TEST guard does not
prevent the correct invocation of kasan_enable_tagging().
Link: https://lkml.kernel.org/r/20220408124323.10028-1-vincenzo.frascino@arm.com
Fixes: ed6d74446cbf ("kasan: test: support async (again) and asymm modes for HW_TAGS")
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zqiang [Fri, 15 Apr 2022 02:13:34 +0000 (19:13 -0700)]
irq_work: use kasan_record_aux_stack_noalloc() record callstack
On PREEMPT_RT kernel and KASAN is enabled. the kasan_record_aux_stack()
may call alloc_pages(), and the rt-spinlock will be acquired, if currently
in atomic context, will trigger warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 239, name: bootlogd
Preemption disabled at:
[<
ffffffffbab1a531>] rt_mutex_slowunlock+0xa1/0x4e0
CPU: 3 PID: 239 Comm: bootlogd Tainted: G W 5.17.1-rt17-yocto-preempt-rt+ #105
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Call Trace:
__might_resched.cold+0x13b/0x173
rt_spin_lock+0x5b/0xf0
get_page_from_freelist+0x20c/0x1610
__alloc_pages+0x25e/0x5e0
__stack_depot_save+0x3c0/0x4a0
kasan_save_stack+0x3a/0x50
__kasan_record_aux_stack+0xb6/0xc0
kasan_record_aux_stack+0xe/0x10
irq_work_queue_on+0x6a/0x1c0
pull_rt_task+0x631/0x6b0
do_balance_callbacks+0x56/0x80
__balance_callbacks+0x63/0x90
rt_mutex_setprio+0x349/0x880
rt_mutex_slowunlock+0x22a/0x4e0
rt_spin_unlock+0x49/0x80
uart_write+0x186/0x2b0
do_output_char+0x2e9/0x3a0
n_tty_write+0x306/0x800
file_tty_write.isra.0+0x2af/0x450
tty_write+0x22/0x30
new_sync_write+0x27c/0x3a0
vfs_write+0x3f7/0x5d0
ksys_write+0xd9/0x180
__x64_sys_write+0x43/0x50
do_syscall_64+0x44/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix it by using kasan_record_aux_stack_noalloc() to avoid the call to
alloc_pages().
Link: https://lkml.kernel.org/r/20220402142555.2699582-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Rasmussen [Fri, 15 Apr 2022 02:13:31 +0000 (19:13 -0700)]
mm/secretmem: fix panic when growing a memfd_secret
When one tries to grow an existing memfd_secret with ftruncate, one gets
a panic [1]. For example, doing the following reliably induces the
panic:
fd = memfd_secret();
ftruncate(fd, 10);
ptr = mmap(NULL, 10, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
strcpy(ptr, "
123456789");
munmap(ptr, 10);
ftruncate(fd, 20);
The basic reason for this is, when we grow with ftruncate, we call down
into simple_setattr, and then truncate_inode_pages_range, and eventually
we try to zero part of the memory. The normal truncation code does this
via the direct map (i.e., it calls page_address() and hands that to
memset()).
For memfd_secret though, we specifically don't map our pages via the
direct map (i.e. we call set_direct_map_invalid_noflush() on every
fault). So the address returned by page_address() isn't useful, and
when we try to memset() with it we panic.
This patch avoids the panic by implementing a custom setattr for
memfd_secret, which detects resizes specifically (setting the size for
the first time works just fine, since there are no existing pages to try
to zero), and rejects them with EINVAL.
One could argue growing should be supported, but I think that will
require a significantly more lengthy change. So, I propose a minimal
fix for the benefit of stable kernels, and then perhaps to extend
memfd_secret to support growing in a separate patch.
[1]:
BUG: unable to handle page fault for address:
ffffa0a889277028
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD
afa01067 P4D
afa01067 PUD
83f909067 PMD
83f8bf067 PTE
800ffffef6d88060
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
CPU: 0 PID: 281 Comm: repro Not tainted 5.17.0-dbg-DEV #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:memset_erms+0x9/0x10
Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
RSP: 0018:
ffffb932c09afbf0 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffffda63c4249dc0 RCX:
0000000000000fd8
RDX:
0000000000000fd8 RSI:
0000000000000000 RDI:
ffffa0a889277028
RBP:
ffffb932c09afc00 R08:
0000000000001000 R09:
ffffa0a889277028
R10:
0000000000020023 R11:
0000000000000000 R12:
ffffda63c4249dc0
R13:
ffffa0a890d70d98 R14:
0000000000000028 R15:
0000000000000fd8
FS:
00007f7294899580(0000) GS:
ffffa0af9bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffa0a889277028 CR3:
0000000107ef6006 CR4:
0000000000370ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
? zero_user_segments+0x82/0x190
truncate_inode_partial_folio+0xd4/0x2a0
truncate_inode_pages_range+0x380/0x830
truncate_setsize+0x63/0x80
simple_setattr+0x37/0x60
notify_change+0x3d8/0x4d0
do_sys_ftruncate+0x162/0x1d0
__x64_sys_ftruncate+0x1c/0x20
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Modules linked in: xhci_pci xhci_hcd virtio_net net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p netfs 9pnet
CR2:
ffffa0a889277028
[lkp@intel.com: secretmem_iops can be static]
Signed-off-by: kernel test robot <lkp@intel.com>
[axelrasmussen@google.com: return EINVAL]
Link: https://lkml.kernel.org/r/20220324210909.1843814-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220412193023.279320-1-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 15 Apr 2022 02:13:27 +0000 (19:13 -0700)]
tmpfs: fix regressions from wider use of ZERO_PAGE
Chuck Lever reported fsx-based xfstests generic 075 091 112 127 failing
when 5.18-rc1 NFS server exports tmpfs: bisected to recent tmpfs change.
Whilst nfsd_splice_action() does contain some questionable handling of
repeated pages, and Chuck was able to work around there, history from
Mark Hemment makes clear that there might be similar dangers elsewhere:
it was not a good idea for me to pass ZERO_PAGE down to unknown actors.
Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when
iter_is_iovec(); in other cases, use the more natural iov_iter_zero()
instead of copy_page_to_iter().
We would use iov_iter_zero() throughout, but the x86 clear_user() is not
nearly so well optimized as copy to user (dd of 1T sparse tmpfs file
takes 57 seconds rather than 44 seconds).
And now pagecache_init() does not need to SetPageUptodate(ZERO_PAGE(0)):
which had caused boot failure on arm noMMU STM32F7 and STM32H7 boards
Link: https://lkml.kernel.org/r/9a978571-8648-e830-5735-1f4748ce2e30@google.com
Fixes: 56a8c8eb1eaf ("tmpfs: do not allocate pages on read")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Patrice CHOTARD <patrice.chotard@foss.st.com>
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Tested-by: Chuck Lever III <chuck.lever@oracle.com>
Cc: Mark Hemment <markhemm@googlemail.com>
Cc: Patrice CHOTARD <patrice.chotard@foss.st.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 15 Apr 2022 02:13:24 +0000 (19:13 -0700)]
MAINTAINERS: Broadcom internal lists aren't maintainers
Convert the broadcom internal list M: and L: entries to R: as exploder
email addresses are neither maintainers nor mailing lists.
Reorder the entries as necessary.
Link: https://lkml.kernel.org/r/04eb301f5b3adbefdd78e76657eff0acb3e3d87f.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Tue, 5 Apr 2022 15:15:11 +0000 (17:15 +0200)]
i2c: ismt: Fix undefined behavior due to shift overflowing the constant
Fix:
drivers/i2c/busses/i2c-ismt.c: In function ‘ismt_hw_init’:
drivers/i2c/busses/i2c-ismt.c:770:2: error: case label does not reduce to an integer constant
case ISMT_SPGT_SPD_400K:
^~~~
drivers/i2c/busses/i2c-ismt.c:773:2: error: case label does not reduce to an integer constant
case ISMT_SPGT_SPD_1M:
^~~~
See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Andy Shevchenko [Mon, 11 Apr 2022 18:07:52 +0000 (21:07 +0300)]
i2c: dev: Force case user pointers in compat_i2cdev_ioctl()
Sparse has warned us about wrong address space for user pointers:
i2c-dev.c:561:50: warning: incorrect type in initializer (different address spaces)
i2c-dev.c:561:50: expected unsigned char [usertype] *buf
i2c-dev.c:561:50: got void [noderef] __user *
Force cast the pointer to (__u8 *) that is used by I²C core code.
Note, this is an additional fix to the previously addressed similar issue
in the I2C_RDWR case in the same function.
Fixes: 3265a7e6b41b ("i2c: dev: Add __user annotation")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Andy Shevchenko [Mon, 11 Apr 2022 18:07:51 +0000 (21:07 +0300)]
i2c: dev: check return value when calling dev_set_name()
If dev_set_name() fails, the dev_name() is null, check the return
value of dev_set_name() to avoid the null-ptr-deref.
Fixes: 1413ef638aba ("i2c: dev: Fix the race between the release of i2c_dev and cdev")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Bjorn Andersson [Tue, 12 Apr 2022 21:26:01 +0000 (14:26 -0700)]
i2c: qcom-geni: Use dev_err_probe() for GPI DMA error
The GPI DMA engine driver can be compiled as a module, in which case the
likely probe deferral "error" shows up in the kernel log. Switch to
using dev_err_probe() to silence this warning and to ensure that
"devices_deferred" in debugfs carries this information.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Marek Vasut [Fri, 8 Apr 2022 17:15:24 +0000 (19:15 +0200)]
i2c: imx: Implement errata ERR007805 or e7805 bus frequency limit
The i.MX8MP Mask Set Errata for Mask 1P33A, Rev. 2.0 has description of
errata ERR007805 as below. This errata is found on all MX8M{M,N,P,Q},
MX7{S,D}, MX6{UL{,L,Z},S{,LL,X},S,D,DL,Q,DP,QP} . MX7ULP, MX8Q, MX8X
are not affected. MX53 and older status is unknown, as the errata
first appears in MX6 errata sheets from 2016 and the latest errata
sheet for MX53 is from 2015. Older SoC errata sheets predate the
MX53 errata sheet. MX8ULP and MX9 status is unknown as the errata
sheet is not available yet.
"
ERR007805 I2C: When the I2C clock speed is configured for 400 kHz,
the SCL low period violates the I2C spec of 1.3 uS min
Description: When the I2C module is programmed to operate at the
maximum clock speed of 400 kHz (as defined by the I2C spec), the SCL
clock low period violates the I2C spec of 1.3 uS min. The user must
reduce the clock speed to obtain the SCL low time to meet the 1.3us
I2C minimum required. This behavior means the SoC is not compliant
to the I2C spec at 400kHz.
Workaround: To meet the clock low period requirement in fast speed
mode, SCL must be configured to 384KHz or less.
"
Implement the workaround by matching on the affected SoC specific
compatible strings and by limiting the maximum bus frequency in case
the SoC is affected.
Signed-off-by: Marek Vasut <marex@denx.de>
To: linux-i2c@vger.kernel.org
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Martin Povišer [Tue, 29 Mar 2022 18:38:17 +0000 (20:38 +0200)]
i2c: pasemi: Wait for write xfers to finish
Wait for completion of write transfers before returning from the driver.
At first sight it may seem advantageous to leave write transfers queued
for the controller to carry out on its own time, but there's a couple of
issues with it:
* Driver doesn't check for FIFO space.
* The queued writes can complete while the driver is in its I2C read
transfer path which means it will get confused by the raising of
XEN (the 'transaction ended' signal). This can cause a spurious
ENODATA error due to premature reading of the MRXFIFO register.
Adding the wait fixes some unreliability issues with the driver. There's
some efficiency cost to it (especially with pasemi_smb_waitready doing
its polling), but that will be alleviated once the driver receives
interrupt support.
Fixes: beb58aa39e6e ("i2c: PA Semi SMBus driver")
Signed-off-by: Martin Povišer <povik+lin@cutebit.org>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Shin'ichiro Kawasaki [Fri, 15 Apr 2022 08:45:13 +0000 (17:45 +0900)]
dm: fix bio length of empty flush
The commit
92986f6b4c8a ("dm: use bio_clone_fast in alloc_io/alloc_tio")
removed bio_clone_fast() call from alloc_tio() when ci->io->tio is
available. In this case, ci->bio is not copied to ci->io->tio.clone.
This is fine since init_clone_info() sets same values to ci->bio and
ci->io->tio.clone.
However, when incoming bios have REQ_PREFLUSH flag, __send_empty_flush()
prepares a zero length bio on stack and set it to ci->bio. At this time,
ci->io->tio.clone still keeps non-zero length. When alloc_tio() chooses
this ci->io->tio.clone as the bio to map, it is passed to targets as
non-empty flush bio. It causes bio length check failure in dm-zoned and
unexpected operation such as dm_accept_partial_bio() call.
To avoid the non-empty flush bio, set zero length to ci->io->tio.clone
in __send_empty_flush().
Fixes: 92986f6b4c8a ("dm: use bio_clone_fast in alloc_io/alloc_tio")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Linus Torvalds [Fri, 15 Apr 2022 18:38:55 +0000 (11:38 -0700)]
Merge tag 'block-5.18-2022-04-15' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Moving of lower_48_bits() to the block layer and a fix for the
unaligned_be48 added with that originally (Alexander, Keith)
- Fix a bad WARN_ON() for trim size checking (Ming)
- A polled IO timeout fix for null_blk (Ming)
- Silence IO error printing for dead disks (Christoph)
- Compat mode range fix (Khazhismel)
- NVMe pull request via Christoph:
- Tone down the error logging added this merge window a bit
(Chaitanya Kulkarni)
- Quirk devices with non-unique unique identifiers (Christoph)
* tag 'block-5.18-2022-04-15' of git://git.kernel.dk/linux-block:
block: don't print I/O error warning for dead disks
block/compat_ioctl: fix range check in BLKGETSIZE
nvme-pci: disable namespace identifiers for Qemu controllers
nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202
nvme: add a quirk to disable namespace identifiers
nvme: don't print verbose errors for internal passthrough requests
block: null_blk: end timed out poll request
block: fix offset/size check in bio_trim()
asm-generic: fix __get_unaligned_be48() on 32 bit platforms
block: move lower_48_bits() to block
Linus Torvalds [Fri, 15 Apr 2022 18:33:20 +0000 (11:33 -0700)]
Merge tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Ensure we check and -EINVAL any use of reserved or struct padding.
Although we generally always do that, it's missed in two spots for
resource updates, one for the ring fd registration from this merge
window, and one for the extended arg. Make sure we have all of them
handled. (Dylan)
- A few fixes for the deferred file assignment (me, Pavel)
- Add a feature flag for the deferred file assignment so apps can tell
we handle it correctly (me)
- Fix a small perf regression with the current file position fix in
this merge window (me)
* tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block:
io_uring: abort file assignment prior to assigning creds
io_uring: fix poll error reporting
io_uring: fix poll file assign deadlock
io_uring: use right issue_flags for splice/tee
io_uring: verify pad field is 0 in io_get_ext_arg
io_uring: verify resv is 0 in ringfd register/unregister
io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
io_uring: move io_uring_rsrc_update2 validation
io_uring: fix assign file locking issue
io_uring: stop using io_wq_work as an fd placeholder
io_uring: move apoll->events cache
io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
io_uring: flag the fact that linked file assignment is sane
Linus Torvalds [Fri, 15 Apr 2022 18:24:32 +0000 (11:24 -0700)]
Merge tag 'linux-kselftest-fixes-5.18-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"A mqueue perf test memory leak bug fix.
mq_perf_tests failed to call CPU_FREE to free memory allocated by
CPU_SET"
* tag 'linux-kselftest-fixes-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
Linus Torvalds [Fri, 15 Apr 2022 18:15:02 +0000 (11:15 -0700)]
Merge tag 'perf-tools-fixes-for-v5.18-2022-04-14' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- 'perf record --per-thread' mode doesn't have the CPU mask setup, so
it can use it to figure out the number of mmaps, fix it.
- Fix segfault accessing sample_id xyarray out of bounds, noticed while
using Intel PT where we have a dummy event to capture text poke perf
metadata events and we mixup the set of CPUs specified by the user
with the all CPUs map needed for text poke.
- Fix 'perf bench numa' to check if CPU used to bind task is online.
- Fix 'perf bench numa' usage of affinity for machines with more than
1000 CPUs.
- Fix misleading add event PMU debug message, noticed while using the
'intel_pt' PMU.
- Fix error check return value of hashmap__new() in 'perf stat', it
must use IS_ERR().
* tag 'perf-tools-fixes-for-v5.18-2022-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1K
perf bench: Fix numa testcase to check if CPU used to bind task is online
perf record: Fix per-thread option
perf tools: Fix segfault accessing sample_id xyarray
perf stat: Fix error check return value of hashmap__new(), must use IS_ERR()
perf tools: Fix misleading add event PMU debug message
Jens Axboe [Fri, 15 Apr 2022 12:33:49 +0000 (06:33 -0600)]
Merge tag 'nvme-5.18-2022-04-15' of git://git.infradead.org/nvme into block-5.18
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.18
- tone down the error logging added this merge window a bit
(Chaitanya Kulkarni)
- quirk devices with non-unique unique identifiers (me)"
* tag 'nvme-5.18-2022-04-15' of git://git.infradead.org/nvme:
nvme-pci: disable namespace identifiers for Qemu controllers
nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202
nvme: add a quirk to disable namespace identifiers
nvme: don't print verbose errors for internal passthrough requests
Christoph Hellwig [Wed, 23 Mar 2022 16:38:15 +0000 (17:38 +0100)]
block: don't print I/O error warning for dead disks
When a disk has been marked dead, don't print warnings for I/O errors
as they are very much expected.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220323163815.1526998-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Khazhismel Kumykov [Thu, 14 Apr 2022 22:40:56 +0000 (15:40 -0700)]
block/compat_ioctl: fix range check in BLKGETSIZE
kernel ulong and compat_ulong_t may not be same width. Use type directly
to eliminate mismatches.
This would result in truncation rather than EFBIG for 32bit mode for
large disks.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220414224056.2875681-1-khazhy@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 12 Apr 2022 05:07:56 +0000 (07:07 +0200)]
nvme-pci: disable namespace identifiers for Qemu controllers
Qemu unconditionally reports a UUID, which depending on the qemu version
is either all-null (which is incorrect but harmless) or contains a single
bit set for all controllers. In addition it can also optionally report
a eui64 which needs to be manually set. Disable namespace identifiers
for Qemu controlles entirely even if in some cases they could be set
correctly through manual intervention.
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Christoph Hellwig [Mon, 11 Apr 2022 06:05:27 +0000 (08:05 +0200)]
nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202
The MAXIO MAP1002/1202 controllers reports completely bogus Namespace
identifiers that even change after suspend cycles. Disable using
the Identifiers entirely.
Reported-by: 金韬 <me@kingtous.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: 金韬 <me@kingtous.cn>
Christoph Hellwig [Mon, 11 Apr 2022 06:05:27 +0000 (08:05 +0200)]
nvme: add a quirk to disable namespace identifiers
Add a quirk to disable using and exporting namespace identifiers for
controllers where they are broken beyond repair.
The most directly visible problem with non-unique namespace identifiers
is that they break the /dev/disk/by-id/ links, with the link for a
supposedly unique identifier now pointing to one of multiple possible
namespaces that share the same ID, and a somewhat random selection of
which one actually shows up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Chaitanya Kulkarni [Mon, 11 Apr 2022 03:12:49 +0000 (20:12 -0700)]
nvme: don't print verbose errors for internal passthrough requests
Use the RQF_QUIET flag to skip the newly added verbose error reporting,
and set the flag in __nvme_submit_sync_cmd, which is used for most
internal passthrough requests where we do expect errors (e.g. due to
probing for optional functionality). This is similar to what the SCSI
verbose error logging does.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Alan Adamson <alan.adamson@oracle.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Jens Axboe [Fri, 15 Apr 2022 02:23:40 +0000 (20:23 -0600)]
io_uring: abort file assignment prior to assigning creds
We need to either restore creds properly if we fail on the file
assignment, or just do the file assignment first instead. Let's do
the latter as it's simpler, should make no difference here for
file assignment.
Link: https://lore.kernel.org/lkml/000000000000a7edb305dca75a50@google.com/
Reported-by: syzbot+60c52ca98513a8760a91@syzkaller.appspotmail.com
Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mike Snitzer [Thu, 14 Apr 2022 15:52:54 +0000 (11:52 -0400)]
dm: allow dm_accept_partial_bio() for dm_io without duplicate bios
The intent behind commit
e6fc9f62ce6e ("dm: flag clones created by
__send_duplicate_bios") was to formally disallow the use of
dm_accept_partial_bio() where it simply isn't possible -- due to
constraint that multiple bios cannot meaningfully update a shared
tio->len_ptr.
But that commit went too far and disallowed the case where "abormal"
IO (e.g. WRITE_ZEROES) is only using a single bio. Fix this by
not marking a dm_io with a single dm_target_io (and bio), that happens
to be created by __send_duplicate_bios, as DM_TIO_IS_DUPLICATE_BIO.
Also remove 'unsigned *len' parameter from alloc_multiple_bios().
This commit fixes a dm_accept_partial_bio() BUG_ON() with dm-zoned
when a WRITE_ZEROES bio is issued.
Fixes: 655f3aad7aa4 ("dm: switch dm_target_io booleans over to proper flags")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Linus Torvalds [Thu, 14 Apr 2022 23:22:16 +0000 (16:22 -0700)]
Merge tag 'drm-fixes-2022-04-15' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Eggs season holidays are among us, and I think I'd expect some smaller
pulls for two weeks then.
This seems eerily quiet. One i915 fix, amdgpu has a bunch and msm. I
didn't see a misc pull this week, so I expect that will catch up next
week.
i915:
- Correct legacy mmap disabling to use GRAPHICS_VER_FULL
msm:
- system suspend fix
- kzalloc return checks
- misc display fix
- iommu_present removal
amdgpu:
- Fix for alpha properly in pre-multiplied mode
- Fix VCN 3.1.2 firmware name
- Suspend/resume fix
- Add a gfxoff quirk for Mac vega20 board
- DCN 3.1.6 spread spectrum fix"
* tag 'drm-fixes-2022-04-15' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: remove dtbclk_ss compensation for dcn316
drm/amdgpu: Enable gfxoff quirk on MacBook Pro
drm/amdgpu: Ensure HDA function is suspended before ASIC reset
drm/amdgpu: fix VCN 3.1.2 firmware name
drm/amd/display: don't ignore alpha property on pre-multiplied mode
drm/msm/gpu: Avoid -Wunused-function with !CONFIG_PM_SLEEP
drm/msm/dp: add fail safe mode outside of event_mutex context
drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init()
drm/msm: Stop using iommu_present()
drm/msm/mdp5: check the return of kzalloc()
drm/msm: Fix range size vs end confusion
drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL
drm/msm/dpu: Use indexed array initializer to prevent mismatches
drm/msm/disp: check the return value of kzalloc()
dt-bindings: display/msm: another fix for the dpu-qcm2290 example
drm/msm: Add missing put_task_struct() in debugfs path
drm/msm/gpu: Remove mutex from wait_event condition
drm/msm/gpu: Park scheduler threads for system suspend
drm/msm/gpu: Rename runtime suspend/resume functions
Linus Torvalds [Thu, 14 Apr 2022 23:06:56 +0000 (16:06 -0700)]
Merge tag 'vfio-v5.18-rc3' of https://github.com/awilliam/linux-vfio
Pull vfio fix from Alex Williamson:
- Fix VF token checking for vfio-pci variant drivers (Jason Gunthorpe)
* tag 'vfio-v5.18-rc3' of https://github.com/awilliam/linux-vfio:
vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used
Linus Torvalds [Thu, 14 Apr 2022 23:00:36 +0000 (16:00 -0700)]
Merge tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- two fixes related to unmount
- symlink overflow fix
- minor netfs fix
- improved tracing for crediting (flow control)
* tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: verify that tcon is valid before dereference in cifs_kill_sb
cifs: potential buffer overflow in handling symlinks
cifs: Split the smb3_add_credits tracepoint
cifs: release cached dentries only if mount is complete
cifs: Check the IOCB_DIRECT flag, not O_DIRECT
NeilBrown [Thu, 14 Apr 2022 03:57:35 +0000 (13:57 +1000)]
VFS: filename_create(): fix incorrect intent.
When asked to create a path ending '/', but which is not to be a
directory (LOOKUP_DIRECTORY not set), filename_create() will never try
to create the file. If it doesn't exist, -ENOENT is reported.
However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
->lookup() function, even though there is no intent to create. This is
misleading and can cause incorrect behaviour.
If you try
ln -s foo /path/dir/
where 'dir' is a directory on an NFS filesystem which is not currently
known in the dcache, this will fail with ENOENT.
But as the name is not in the dcache, nfs_lookup gets called with
LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
lookup, with the expectation that a subsequent call to create the target
will be made, and the lookup can be combined with the creation. In the
case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
made. Instead filename_create() sees that the dentry is not (yet)
positive and returns -ENOENT - even though the directory actually
exists.
So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
create, and use the absence of these flags to decide if -ENOENT should
be returned.
Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
split that out and store it in 'reval_flag'. __lookup_hash() then gets
reval_flag combined with whatever create flags were determined to be
needed.
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Thu, 14 Apr 2022 21:14:19 +0000 (07:14 +1000)]
Merge tag 'amd-drm-fixes-5.18-2022-04-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.18-2022-04-13:
amdgpu:
- Fix for alpha properly in pre-multiplied mode
- Fix VCN 3.1.2 firmware name
- Suspend/resume fix
- Add a gfxoff quirk for Mac vega20 board
- DCN 3.1.6 spread spectrum fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220414025821.5811-1-alexander.deucher@amd.com
Rob Herring [Wed, 13 Apr 2022 14:00:15 +0000 (09:00 -0500)]
dt-bindings: display: panel-timing: Define a single type for properties
It's not good practice to define multiple types for the same property, so
factor out the type reference making the properties always an uint32-array
with a length of 1 or 3 items.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://lore.kernel.org/r/20220413140016.3131013-1-robh@kernel.org
Linus Walleij [Sun, 10 Apr 2022 20:28:37 +0000 (22:28 +0200)]
ARM: config: Update Gemini defconfig
The Gemini defconfig needs to be updated due to DSA driver
Kconfig changes in the v5.18 merge window:
CONFIG_NET_DSA_REALTEK_SMI is now behind
CONFIG_NET_DSA_REALTEK and the desired DSA switch need
to be selected explicitly with
CONFIG_NET_DSA_REALTEK_RTL8366RB.
Take this opportunity to update some other minor config
options:
- CONFIG_MARVELL_PHY moved around because of Kconfig
changes.
- CONFIG_SENSORS_DRIVETEMP should be selected since is
regulates the system critical alert temperature on
some devices, which is nice if it is handled even
if initramfs or root fails to mount.
Fixes: 319a70a5fea9 ("net: dsa: realtek-smi: move to subdirectory")
Fixes: 765c39a4fafe ("net: dsa: realtek: convert subdrivers into modules")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Hans Ulli Kroll <ulli.kroll@googlemail.com>
Cc: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Rob Herring [Thu, 7 Apr 2022 22:52:54 +0000 (17:52 -0500)]
arm64: dts: qcom/sdm845-shift-axolotl: Fix boolean properties with values
Boolean properties in DT are present or not present and don't take a value.
A property such as 'foo = <0>;' evaluated to true. IOW, the value doesn't
matter.
It may have been intended that 0 values are false, but there is no change
in behavior with this patch.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Krzysztof Kozlowski <krzk+dt@kernel.org>
Cc: linux-arm-msm@vger.kernel.org
Link: https://lore.kernel.org/r/20220407225254.2178644-1-robh@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Krzysztof Kozlowski [Thu, 7 Apr 2022 14:30:27 +0000 (16:30 +0200)]
ARM: dts: align SPI NOR node name with dtschema
The node names should be generic and SPI NOR dtschema expects "flash".
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20220407143027.294678-1-krzysztof.kozlowski@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Rob Herring [Thu, 7 Apr 2022 22:51:07 +0000 (17:51 -0500)]
ARM: dts: Fix more boolean properties with values
Boolean properties in DT are present or not present and don't take a value.
A property such as 'foo = <0>;' evaluated to true. IOW, the value doesn't
matter.
It may have been intended that 0 values are false, but there is no change
in behavior with this patch.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Krzysztof Kozlowski <krzk+dt@kernel.org>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: "Benoît Cousson" <bcousson@baylibre.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-omap@vger.kernel.org
Cc: linux-arm-msm@vger.kernel.org
Link: https://lore.kernel.org/r/20220407225107.2175958-1-robh@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Thu, 14 Apr 2022 20:48:30 +0000 (22:48 +0200)]
Merge tag 'ux500-defconfig-soc-v5.18' of git://git./linux/kernel/git/linusw/linux-nomadik into arm/fixes
Defconfig updates for kernel v5.18:
- Refresh defconfig with new and moved options
- Add some new hardware drivers
- Activate battery charging
* tag 'ux500-defconfig-soc-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik:
ARM: config: u8500: Re-enable AB8500 battery charging
ARM: config: u8500: Add some common hardware
ARM: config: Refresh U8500 defconfig
Link: https://lore.kernel.org/r/CACRpkdY+_Go4XNzOh+Rvc24QBnUud2k-S7VQuaH5d-j71_dJog@mail.gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Thu, 14 Apr 2022 19:17:25 +0000 (12:17 -0700)]
Merge tag 's390-5.18-3' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Convert current_stack_pointer to a register alias like it is assumed
if ARCH_HAS_CURRENT_STACK_POINTER is selected. The existing
implementation as a function breaks CONFIG_HARDENED_USERCOPY
sanity-checks
- Get rid of -Warray-bounds warning within kexec code
- Add minimal IBM z16 support by reporting a proper elf platform, and
adding compile options
- Update defconfigs
* tag 's390-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: enable CONFIG_HARDENED_USERCOPY in debug_defconfig
s390: current_stack_pointer shouldn't be a function
s390: update defconfigs
s390/kexec: silence -Warray-bounds warning
s390: allow to compile with z16 optimizations
s390: add z16 elf platform
Linus Torvalds [Thu, 14 Apr 2022 18:58:19 +0000 (11:58 -0700)]
Merge tag 'net-5.18-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless and netfilter.
Current release - regressions:
- smc: fix af_ops of child socket pointing to released memory
- wifi: ath9k: fix usage of driver-private space in tx_info
Previous releases - regressions:
- ipv6: fix panic when forwarding a pkt with no in6 dev
- sctp: use the correct skb for security_sctp_assoc_request
- smc: fix NULL pointer dereference in smc_pnet_find_ib()
- sched: fix initialization order when updating chain 0 head
- phy: don't defer probe forever if PHY IRQ provider is missing
- dsa: revert "net: dsa: setup master before ports"
- dsa: felix: fix tagging protocol changes with multiple CPU ports
- eth: ice:
- fix use-after-free when freeing @rx_cpu_rmap
- revert "iavf: fix deadlock occurrence during resetting VF
interface"
- eth: lan966x: stop processing the MAC entry is port is wrong
Previous releases - always broken:
- sched:
- flower: fix parsing of ethertype following VLAN header
- taprio: check if socket flags are valid
- nfc: add flush_workqueue to prevent uaf
- veth: ensure eth header is in skb's linear part
- eth: stmmac: fix altr_tse_pcs function when using a fixed-link
- eth: macb: restart tx only if queue pointer is lagging
- eth: macvlan: fix leaking skb in source mode with nodst option"
* tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
net: dsa: felix: fix tagging protocol changes with multiple CPU ports
tun: annotate access to queue->trans_start
nfc: nci: add flush_workqueue to prevent uaf
net: dsa: realtek: don't parse compatible string for RTL8366S
net: dsa: realtek: fix Kconfig to assure consistent driver linkage
net: ftgmac100: access hardware register after clock ready
Revert "net: dsa: setup master before ports"
macvlan: Fix leaking skb in source mode with nodst option
netfilter: nf_tables: nft_parse_register can return a negative value
net: lan966x: Stop processing the MAC entry is port is wrong.
net: lan966x: Fix when a port's upper is changed.
net: lan966x: Fix IGMP snooping when frames have vlan tag
net: lan966x: Update lan966x_ptp_get_nominal_value
sctp: Initialize daddr on peeled off socket
net/smc: Fix af_ops of child socket pointing to released memory
net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
net/smc: use memcpy instead of snprintf to avoid out of bounds read
net: macb: Restart tx only if queue pointer is lagging
...
Linus Torvalds [Thu, 14 Apr 2022 18:08:12 +0000 (11:08 -0700)]
Merge tag 'sound-5.18-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became an unexpectedly large pull request due to various
regression fixes in the previous kernels.
The majority of fixes are a series of patches to address the
regression at probe errors in devres'ed drivers, while there are yet
more fixes for the x86 SG allocations and for USB-audio buffer
management. In addition, a few HD-audio quirks and other small fixes
are found"
* tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
ALSA: usb-audio: Limit max buffer and period sizes per time
ALSA: memalloc: Add fallback SG-buffer allocations for x86
ALSA: nm256: Don't call card private_free at probe error path
ALSA: mtpav: Don't call card private_free at probe error path
ALSA: rme9652: Fix the missing snd_card_free() call at probe error
ALSA: hdspm: Fix the missing snd_card_free() call at probe error
ALSA: hdsp: Fix the missing snd_card_free() call at probe error
ALSA: oxygen: Fix the missing snd_card_free() call at probe error
ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
ALSA: cmipci: Fix the missing snd_card_free() call at probe error
ALSA: aw2: Fix the missing snd_card_free() call at probe error
ALSA: als300: Fix the missing snd_card_free() call at probe error
ALSA: lola: Fix the missing snd_card_free() call at probe error
ALSA: bt87x: Fix the missing snd_card_free() call at probe error
ALSA: sis7019: Fix the missing error handling
ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
ALSA: via82xx: Fix the missing snd_card_free() call at probe error
ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
ALSA: rme96: Fix the missing snd_card_free() call at probe error
ALSA: rme32: Fix the missing snd_card_free() call at probe error
...