Joerg Roedel [Fri, 14 Apr 2023 11:45:50 +0000 (13:45 +0200)]
Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next
Jason Gunthorpe [Wed, 12 Apr 2023 14:11:58 +0000 (11:11 -0300)]
iommu: Remove iommu_group_get_by_id()
This is never called.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0-v1-60bbc66d7e92+24-rm_iommu_get_by_id_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Jason Gunthorpe [Wed, 12 Apr 2023 14:10:45 +0000 (11:10 -0300)]
iommu: Make iommu_release_device() static
This is not called outside the core code, and indeed cannot be called
correctly outside the bus notifier. Make it static.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0-v1-c3da18124d2d+56-rm_iommu_release_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tina Zhang [Thu, 13 Apr 2023 04:06:45 +0000 (12:06 +0800)]
iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
The dmar_insert_dev_scope() could fail if any unexpected condition is
encountered. However, in this situation, the kernel should attempt
recovery and proceed with execution. Remove BUG_ON with WARN_ON, so that
kernel can avoid being crashed when an unexpected condition occurs.
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-8-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tina Zhang [Thu, 13 Apr 2023 04:06:44 +0000 (12:06 +0800)]
iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
When dmar_alloc_pci_notify_info() is being invoked, the invoker has
ensured the dev->is_virtfn is false. So, remove the useless BUG_ON in
dmar_alloc_pci_notify_info().
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-7-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tina Zhang [Thu, 13 Apr 2023 04:06:43 +0000 (12:06 +0800)]
iommu/vt-d: Remove BUG_ON in map/unmap()
Domain map/unmap with invalid parameters shouldn't crash the kernel.
Therefore, using if() replaces the BUG_ON.
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-6-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tina Zhang [Thu, 13 Apr 2023 04:06:42 +0000 (12:06 +0800)]
iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
When performing domain_context_mapping or getting dma_pte of a pfn, the
availability of the domain page table directory is ensured. Therefore,
the domain->pgd checkings are unnecessary.
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-5-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tina Zhang [Thu, 13 Apr 2023 04:06:41 +0000 (12:06 +0800)]
iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
VT-d iotlb cache invalidation request with unexpected type is considered
as a bug to developers, which can be fixed. So, when such kind of issue
comes out, it needs to be reported through the kernel log, instead of
halting the system. Replacing BUG_ON with warning reporting.
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-4-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tina Zhang [Thu, 13 Apr 2023 04:06:40 +0000 (12:06 +0800)]
iommu/vt-d: Remove BUG_ON on checking valid pfn range
When encountering an unexpected invalid pfn range, the kernel should
attempt recovery and proceed with execution. Therefore, using WARN_ON to
replace BUG_ON to avoid halting the machine.
Besides, one redundant checking is reduced.
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-3-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tina Zhang [Thu, 13 Apr 2023 04:06:39 +0000 (12:06 +0800)]
iommu/vt-d: Make size of operands same in bitwise operations
This addresses the following issue reported by klocwork tool:
- operands of different size in bitwise operations
Suggested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-2-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Jacob Pan [Thu, 13 Apr 2023 04:06:38 +0000 (12:06 +0800)]
iommu/vt-d: Remove PASID supervisor request support
There's no more usage, remove PASID supervisor support.
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230331231137.1947675-3-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Jacob Pan [Thu, 13 Apr 2023 04:06:37 +0000 (12:06 +0800)]
iommu/vt-d: Use non-privileged mode for all PASIDs
Supervisor Request Enable (SRE) bit in a PASID entry is for permission
checking on DMA requests. When SRE = 0, DMA with supervisor privilege
will be blocked. However, for in-kernel DMA this is not necessary in that
we are targeting kernel memory anyway. There's no need to differentiate
user and kernel for in-kernel DMA.
Let's use non-privileged (user) permission for all PASIDs used in kernel,
it will be consistent with DMA without PASID (RID_PASID) as well.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230331231137.1947675-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Thu, 13 Apr 2023 04:06:36 +0000 (12:06 +0800)]
iommu/vt-d: Remove extern from function prototypes
The kernel coding style does not require 'extern' in function prototypes
in .h files, so remove them from drivers/iommu/intel/iommu.h as they are
not needed.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230331045452.500265-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Christophe JAILLET [Thu, 13 Apr 2023 04:06:35 +0000 (12:06 +0800)]
iommu/vt-d: Do not use GFP_ATOMIC when not needed
There is no need to use GFP_ATOMIC here. GFP_KERNEL is already used for
some other memory allocations just a few lines above.
Commit
e3a981d61d15 ("iommu/vt-d: Convert allocations to GFP_KERNEL") has
changed the other memory allocation flags.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/e2a8a1019ffc8a86b4b4ed93def3623f60581274.1675542576.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Thu, 13 Apr 2023 04:06:34 +0000 (12:06 +0800)]
iommu/vt-d: Remove unnecessary checks in iopf disabling path
iommu_unregister_device_fault_handler() and iopf_queue_remove_device()
are called after device has stopped issuing new page falut requests and
all outstanding page requests have been drained. They should never fail.
Trigger a warning if it happens unfortunately.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Thu, 13 Apr 2023 04:06:33 +0000 (12:06 +0800)]
iommu/vt-d: Move PRI handling to IOPF feature path
PRI is only used for IOPF. With this move, the PCI/PRI feature could be
controlled by the device driver through iommu_dev_enable/disable_feature()
interfaces.
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Thu, 13 Apr 2023 04:06:32 +0000 (12:06 +0800)]
iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
They should be part of the per-device iommu private data initialization.
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Thu, 13 Apr 2023 04:06:31 +0000 (12:06 +0800)]
iommu/vt-d: Move iopf code from SVA to IOPF enabling path
Generally enabling IOMMU_DEV_FEAT_SVA requires IOMMU_DEV_FEAT_IOPF, but
some devices manage I/O Page Faults themselves instead of relying on the
IOMMU. Move IOPF related code from SVA to IOPF enabling path.
For the device drivers that relies on the IOMMU for IOPF through PCI/PRI,
IOMMU_DEV_FEAT_IOPF must be enabled before and disabled after
IOMMU_DEV_FEAT_SVA.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Thu, 13 Apr 2023 04:06:30 +0000 (12:06 +0800)]
iommu/vt-d: Allow SVA with device-specific IOPF
Currently enabling SVA requires IOPF support from the IOMMU and device
PCI PRI. However, some devices can handle IOPF by itself without ever
sending PCI page requests nor advertising PRI capability.
Allow SVA support with IOPF handled either by IOMMU (PCI PRI) or device
driver (device-specific IOPF). As long as IOPF could be handled, SVA
should continue to work.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Thu, 13 Apr 2023 04:06:29 +0000 (12:06 +0800)]
dmaengine: idxd: Add enable/disable device IOPF feature
The iommu subsystem requires IOMMU_DEV_FEAT_IOPF must be enabled before
and disabled after IOMMU_DEV_FEAT_SVA, if device's I/O page faults rely
on the IOMMU. Add explicit IOMMU_DEV_FEAT_IOPF enabling/disabling in this
driver.
At present, missing IOPF enabling/disabling doesn't cause any real issue,
because the IOMMU driver places the IOPF enabling/disabling in the path
of SVA feature handling. But this may change.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Thu, 13 Apr 2023 10:01:05 +0000 (12:01 +0200)]
Merge tag 'arm-smmu-updates' of git://git./linux/kernel/git/will/linux into arm/smmu
Arm SMMU updates for 6.4
- Device-tree binding updates:
* Allow Qualcomm GPU SMMUs to accept relevant clock properties
* Document Qualcomm 8550 SoC as implementing an MMU-500
* Favour new "qcom,smmu-500" binding for Adreno SMMUs
- Fix S2CR quirk detection on non-architectural Qualcomm SMMU
implementations
- Acknowledge SMMUv3 PRI queue overflow when consuming events
- Document (in a comment) why ATS is disabled for bypass streams
Yong Wu [Tue, 11 Apr 2023 09:31:44 +0000 (17:31 +0800)]
arm64: dts: mt8186: Add dma-ranges for the parent "soc" node
Prepare for the MM nodes whose dma-ranges(iova range) is 16GB.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-15-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:43 +0000 (17:31 +0800)]
arm64: dts: mt8195: Add dma-ranges for the parent "soc" node
After commit
f1ad5338a4d5 ("of: Fix "dma-ranges" handling for bus
controllers"), the dma-ranges property is not allowed for
the leaf node. But our iommu/dma-ranges is 16GB, we still expect
separate the 16GB dma-range like:
a) display is in 0 - 4GB;
b) vcodec is in 4GB - 8GB;
c) camera is in 8GB - 12GB.
We can not expect all the masters add a parent node for them,
especial for the existed drivers/nodes.
Thus, we add whole the 16GB dma-ranges in the parent "soc" node.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-14-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:42 +0000 (17:31 +0800)]
arm64: dts: mt8195: Remove the unnecessary dma-ranges
After we add the dma-ranges in the parent "soc" node,
this property is unnecessary for the leaf node.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-13-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:41 +0000 (17:31 +0800)]
media: mediatek: vcodec: Remove the setting for dma_mask
In order to simplify the masters to set their respective dma masks, MTK
IOMMU helps to centralize the processing. Because all the dma ranges is
set in IOMMU, IOMMU knows well the dma mask requirements of masters. After
this patch, the masters(codec here) code does not need care
dma-ranges/dma_mask related information.
Cc: Tiffany Lin <tiffany.lin@mediatek.com>
Cc: Andrew-CT Chen <andrew-ct.chen@mediatek.com>
Cc: Yunfei Dong <yunfei.dong@mediatek.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: irui wang <irui.wang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-12-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:40 +0000 (17:31 +0800)]
media: mtk-jpegdec: Remove the setting for dma_mask
In order to simplify the masters to set their respective dma masks, MTK
IOMMU helps to centralize the processing. Because all the dma ranges is
set in IOMMU, IOMMU knows well the dma mask requirements of masters. After
this patch, the masters code does not need care
dma-ranges/dma_mask related information.
Cc: Bin Liu <bin.liu@mediatek.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: kyrie wu <kyrie.wu@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-11-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:39 +0000 (17:31 +0800)]
iommu/mediatek: Set dma_mask for the master devices
MediaTek iommu arranges dma ranges for all the masters, this patch is to
help them set dma mask. This is to avoid each master setting their own
mask, but also to avoid a real issue, such as JPEG uses
"mediatek,mtk-jpgenc" for 2701/8183/8186/8188, then JPEG could ignore its
different dma_mask in different SoC to achieve common code.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-10-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:38 +0000 (17:31 +0800)]
iommu/mediatek: Add a gap for the iova regions
As the removed property in the vcodec dt-binding, the property is:
dma-ranges = <0x1 0x0 0x0 0x40000000 0x0 0xfff00000>;
The length is 0xfff0_0000 rather than 0x1_0000_0000, this means it
requires 1M as a gap. This is because the end address for some vcodec
HW is (address + size). If the size is 4G, the end address may be
0x2_0000_0000, and the width for vcodec register only is 32, then the
HW may get the ZERO address.
Currently the consumer's dma-ranges property doesn't work, IOMMU
has to consider this case. Add a bigger gap(8M) for all the regions
to avoid it.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-9-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:37 +0000 (17:31 +0800)]
iommu/mediatek: mt8186: Add iova_region_larb_msk
Add iova_region_larb_msk for mt8186. We separate the 16GB iova regions
by each device's larbid/portid.
Note: larb5/6/10/12/14/15/18 connect nothing in this SoC.
Refer to include/dt-bindings/memory/mt8186-memory-port.h
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-8-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:36 +0000 (17:31 +0800)]
iommu/mediatek: mt8195: Add iova_region_larb_msk
Add iova_region_larb_msk for mt8195. We separate the 16GB iova regions
by each device's larbid/portid.
Refer to include/dt-bindings/memory/mt8195-memory-port.h
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-7-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:35 +0000 (17:31 +0800)]
iommu/mediatek: mt8192: Add iova_region_larb_msk
Add iova_region_larb_msk for mt8192. We separate the 16GB iova regions
by each device's larbid/portid.
Note: larb3/6/8/10/12/15 connect nothing in this SoC.
Refer to the comment in include/dt-bindings/memory/mt8192-larb-port.h
Define a new macro MT8192_MULTI_REGION_NR_MAX to indicate
the index of mt8xxx_larb_region_msk and
"struct mtk_iommu_iova_region mt8192_multi_dom"
are the same.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-6-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:34 +0000 (17:31 +0800)]
iommu/mediatek: Get regionid from larb/port id
After commit
f1ad5338a4d5 ("of: Fix "dma-ranges" handling for bus
controllers"), the dma-ranges is not allowed for dts leaf node.
but we still would like to separate to different masters
into different iova regions.
Thus we have to separate it by the HW larbid and portid. For example,
larb1/2 are in region2 and larb3 is in region3. The problem is that
some ports inside a larb are in region4 while some ports inside this
larb are in region5. Therefore I define a "iova_region_larb_msk" to help
record the information for each a port. Take a example for a larb:
[1] = ~0: means all ports in this larb are in region1;
[2] = BIT(3) | BIT(4): means port3/4 in this larb are region2;
[3] = ~(BIT(3) | BIT(4)): means all the other ports except port3/4
in this larb are region3.
This method also avoids the users forget/abuse the iova regions.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-5-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:33 +0000 (17:31 +0800)]
iommu/mediatek: Improve comment for the current region/bank
No functional change. Just add more comment about the current region/bank
in the code.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-4-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:32 +0000 (17:31 +0800)]
dt-bindings: media: mediatek,jpeg: Remove dma-ranges property
After commit
f1ad5338a4d5 ("of: Fix "dma-ranges" handling for bus
controllers"), the dma-ranges of the leaf node doesn't work. Remove
it for jpeg here.
Currently there is only mt8195 jpeg node has this property in upstream,
and it already uses parent-child node, this property did work. But instead,
MediaTek iommu will control the masters' iova ranges by the master's
larb/port id internally, then this property is unnecessary.
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Bin Liu <bin.liu@mediatek.com>
Cc: kyrie wu <kyrie.wu@mediatek.corp-partner.google.com>
Cc: Xia Jiang <xia.jiang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Link: https://lore.kernel.org/r/20230411093144.2690-3-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Yong Wu [Tue, 11 Apr 2023 09:31:31 +0000 (17:31 +0800)]
dt-bindings: media: mediatek,vcodec: Remove dma-ranges property
After commit
f1ad5338a4d5 ("of: Fix "dma-ranges" handling for bus
controllers"), the dma-ranges of the leaf node doesn't work. Remove
it for vcodec here.
For mediatek,vcodec-decoder.yaml and mediatek,vcodec-encoder.yaml,
this property is in the leaf node, it is invalid as the above comment.
Currently there is only mt8195 VENC node has this property in upstream.
Indeed, VENC is affected, but it is not a fatal issue. Originally it
expects its iova range locate at 4GB-8GB. However after that commit, its
expectation doesn't come true, it will fall back to 0-4GB iova and also
could work well.
Cc: Tiffany Lin <tiffany.lin@mediatek.com>
Cc: Andrew-CT Chen <andrew-ct.chen@mediatek.com>
Cc: Yunfei Dong <yunfei.dong@mediatek.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Link: https://lore.kernel.org/r/20230411093144.2690-2-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Kishon Vijay Abraham I [Wed, 5 Apr 2023 13:03:17 +0000 (13:03 +0000)]
iommu/amd: Fix "Guest Virtual APIC Table Root Pointer" configuration in IRTE
commit
b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC
(de-)activation code") while refactoring guest virtual APIC
activation/de-activation code, stored information for activate/de-activate
in "struct amd_ir_data". It used 32-bit integer data type for storing the
"Guest Virtual APIC Table Root Pointer" (ga_root_ptr), though the
"ga_root_ptr" is actually a 40-bit field in IRTE (Interrupt Remapping
Table Entry).
This causes interrupts from PCIe devices to not reach the guest in the case
of PCIe passthrough with SME (Secure Memory Encryption) enabled as _SME_
bit in the "ga_root_ptr" is lost before writing it to the IRTE.
Fix it by using 64-bit data type for storing the "ga_root_ptr". While at
that also change the data type of "ga_tag" to u32 in order to match
the IOMMU spec.
Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Cc: stable@vger.kernel.org # v5.4+
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Link: https://lore.kernel.org/r/20230405130317.9351-1-kvijayab@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Jerry Snitselaar [Tue, 4 Apr 2023 07:27:42 +0000 (00:27 -0700)]
iommu/amd: Set page size bitmap during V2 domain allocation
With the addition of the V2 page table support, the domain page size
bitmap needs to be set prior to iommu core setting up direct mappings
for reserved regions. When reserved regions are mapped, if this is not
done, it will be looking at the V1 page size bitmap when determining
the page size to use in iommu_pgsize(). When it gets into the actual
amd mapping code, a check of see if the page size is supported can
fail, because at that point it is checking it against the V2 page size
bitmap which only supports 4K, 2M, and 1G.
Add a check to __iommu_domain_alloc() to not override the
bitmap if it was already set by the iommu ops domain_alloc() code path.
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Fixes: 4db6c41f0946 ("iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230404072742.1895252-1-jsnitsel@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Steven Price [Fri, 31 Mar 2023 09:51:54 +0000 (10:51 +0100)]
iommu/rockchip: Add missing set_platform_dma_ops callback
Similar to exynos, we need a set_platform_dma_ops() callback for proper
operation on ARM 32 bit after recent changes in the IOMMU framework
(detach ops removal). But also the use of a NULL domain is confusing.
Rework the code to add support for IOMMU_DOMAIN_IDENTITY and a singleton
rk_identity_domain which is assigned to domain when using an identity
mapping rather than "detaching". This makes the code easier to reason about.
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20230331095154.2671129-1-steven.price@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Christophe JAILLET [Sun, 26 Mar 2023 12:37:50 +0000 (14:37 +0200)]
iommu/exynos: Use the devm_clk_get_optional() helper
Use devm_clk_get_optional() instead of hand writing it.
This saves some loC and improves the semantic.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/99c0d5ce643737ee0952df41fd60433a0bbeb447.1679834256.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Will Deacon [Tue, 11 Apr 2023 13:05:06 +0000 (14:05 +0100)]
Merge branch 'for-joerg/arm-smmu/bindings' into for-joerg/arm-smmu/updates
Updates to the Arm SMMU device-tree bindings.
* for-joerg/arm-smmu/bindings:
dt-bindings: arm-smmu: Document SM61[12]5 GPU SMMU
dt-bindings: arm-smmu: Add SM8350 Adreno SMMU
dt-bindings: arm-smmu: Use qcom,smmu compatible for MMU500 adreno SMMUs
dt-bindings: arm-smmu: Add compatible for SM8550 SoC
Linus Torvalds [Sun, 9 Apr 2023 18:15:57 +0000 (11:15 -0700)]
Linux 6.3-rc6
Linus Torvalds [Sun, 9 Apr 2023 17:10:46 +0000 (10:10 -0700)]
Merge tag 'perf_urgent_for_v6.3_rc6' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Fix "same task" check when redirecting event output
- Do not wait unconditionally for RCU on the event migration path if
there are no events to migrate
* tag 'perf_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix the same task check in perf_event_set_output
perf: Optimize perf_pmu_migrate_context()
Linus Torvalds [Sun, 9 Apr 2023 17:00:16 +0000 (10:00 -0700)]
Merge tag 'x86_urgent_for_v6.3_rc6' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add a new Intel Arrow Lake CPU model number
- Fix a confusion about how to check the version of the ACPI spec which
supports a "online capable" bit in the MADT table which lead to a
bunch of boot breakages with Zen1 systems and VMs
* tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add model number for Intel Arrow Lake processor
x86/acpi/boot: Correct acpi_is_processor_usable() check
x86/ACPI/boot: Use FADT version to check support for online capable
Linus Torvalds [Sun, 9 Apr 2023 16:45:46 +0000 (09:45 -0700)]
Merge tag 'cxl-fixes-6.3-rc6' of git://git./linux/kernel/git/cxl/cxl
Pull compute express link (cxl) fixes from Dan Williams:
"Several fixes for driver startup regressions that landed during the
merge window as well as some older bugs.
The regressions were due to a lack of testing with what the CXL
specification calls Restricted CXL Host (RCH) topologies compared to
the testing with Virtual Host (VH) CXL topologies. A VH topology is
typical PCIe while RCH topologies map CXL endpoints as Root Complex
Integrated endpoints. The impact is some driver crashes on startup.
This merge window also added compatibility for range registers (the
mechanism that CXL 1.1 defined for mapping memory) to treat them like
HDM decoders (the mechanism that CXL 2.0 defined for mapping
Host-managed Device Memory). That work collided with the new region
enumeration code that was tested with CXL 2.0 setups, and fails with
crashes at startup.
Lastly, the DOE (Data Object Exchange) implementation for retrieving
an ACPI-like data table from CXL devices is being reworked for v6.4.
Several fixes fell out of that work that are suitable for v6.3.
All of this has been in linux-next for a while, and all reported
issues [1] have been addressed.
Summary:
- Fix several issues with region enumeration in RCH topologies that
can trigger crashes on driver startup or shutdown.
- Fix CXL DVSEC range register compatibility versus region
enumeration that leads to startup crashes
- Fix CDAT endiannes handling
- Fix multiple buffer handling boundary conditions
- Fix Data Object Exchange (DOE) workqueue usage vs
CONFIG_DEBUG_OBJECTS warn splats"
Link: http://lore.kernel.org/r/20230405075704.33de8121@canb.auug.org.au
* tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/hdm: Extend DVSEC range register emulation for region enumeration
cxl/hdm: Limit emulation to the number of range registers
cxl/region: Move coherence tracking into cxl_region_attach()
cxl/region: Fix region setup/teardown for RCDs
cxl/port: Fix find_cxl_root() for RCDs and simplify it
cxl/hdm: Skip emulation when driver manages mem_enable
cxl/hdm: Fix double allocation of @cxlhdm
PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
cxl/pci: Handle excessive CDAT length
cxl/pci: Handle truncated CDAT entries
cxl/pci: Handle truncated CDAT header
cxl/pci: Fix CDAT retrieval on big endian
Linus Torvalds [Sun, 9 Apr 2023 01:37:45 +0000 (18:37 -0700)]
Merge tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
"Two cifs/smb3 client fixes, one for stable:
- double lock fix for a cifs/smb1 reconnect path
- DFS prefixpath fix for reconnect when server moved"
* tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: double lock in cifs_reconnect_tcon()
cifs: sanitize paths in cifs_update_super_prepath.
Linus Torvalds [Sat, 8 Apr 2023 19:21:37 +0000 (12:21 -0700)]
Merge tag 'char-misc-6.3-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a small set of various small driver changes for 6.3-rc6.
Included in here are:
- iio driver fixes for reported problems
- coresight hwtracing bugfix for reported problem
- small counter driver bugfixes
All have been in linux-next for a while with no reported problems"
* tag 'char-misc-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
coresight: etm4x: Do not access TRCIDR1 for identification
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
iio: adc: palmas_gpadc: fix NULL dereference on rmmod
counter: 104-quad-8: Fix Synapse action reported for Index signals
counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
iio: adc: max11410: fix read_poll_timeout() usage
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
iio: light: cm32181: Unregister second I2C client if present
iio: accel: kionix-kx022a: Get the timestamp from the driver's private data in the trigger_handler
iio: adc: ad7791: fix IRQ flags
iio: buffer: make sure O_NONBLOCK is respected
iio: buffer: correctly return bytes written in output buffers
iio: light: vcnl4000: Fix WARN_ON on uninitialized lock
iio: adis16480: select CONFIG_CRC32
drivers: iio: adc: ltc2497: fix LSB shift
iio: adc: qcom-spmi-adc5: Fix the channel name
Linus Torvalds [Sat, 8 Apr 2023 19:17:46 +0000 (12:17 -0700)]
Merge tag 'tty-6.3-rc6' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for some reported
problems:
- fsl_uart driver bugfixes
- sh-sci serial driver bugfixes
- renesas serial driver DT binding bugfixes
- 8250 DMA bugfix
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
tty: serial: fsl_lpuart: fix crash in lpuart_uport_is_active
tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
serial: 8250: Prevent starting up DMA Rx on THRI interrupt
dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
tty: serial: sh-sci: Fix transmit end interrupt handler
Linus Torvalds [Sat, 8 Apr 2023 19:13:39 +0000 (12:13 -0700)]
Merge tag 'usb-6.3-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB bugfixes from Greg KH:
"Here are some small USB bugfixes for 6.3-rc6 that have been in my
tree, and in linux-next, for a while. Included in here are:
- new usb-serial driver device ids
- xhci bugfixes for reported problems
- gadget driver bugfixes for reported problems
- dwc3 new device id
All have been in linux-next with no reported problems"
* tag 'usb-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdnsp: Fixes error: uninitialized symbol 'len'
usb: gadgetfs: Fix ep_read_iter to handle ITER_UBUF
usb: gadget: f_fs: Fix ffs_epfile_read_iter to handle ITER_UBUF
usb: typec: altmodes/displayport: Fix configure initial pin assignment
usb: dwc3: pci: add support for the Intel Meteor Lake-S
xhci: Free the command allocated for setting LPM if we return early
Revert "usb: xhci-pci: Set PROBE_PREFER_ASYNCHRONOUS"
xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
USB: serial: option: add Quectel RM500U-CN modem
usb: xhci: tegra: fix sleep in atomic call
USB: serial: option: add Telit FE990 compositions
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
Linus Torvalds [Sat, 8 Apr 2023 18:57:05 +0000 (11:57 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes, all in drivers. They're all one or two lines except
for the ufs one, but that's a simple revert of a previous feature"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)
scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"
Linus Torvalds [Sat, 8 Apr 2023 18:40:41 +0000 (11:40 -0700)]
Merge tag 'block-6.3-2023-04-06' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Ensure that ublk always reads the whole sqe upfront (me)
- Fix for a block size probing issue with ublk (Ming)
- Fix for the bio based polling (Keith)
- NVMe pull request via Christoph:
- fix discard support without oncs (Keith Busch)
- Partition scan error handling regression fix (Yu)
* tag 'block-6.3-2023-04-06' of git://git.kernel.dk/linux:
block: don't set GD_NEED_PART_SCAN if scan partition failed
block: ublk: make sure that block size is set correctly
ublk: read any SQE values upfront
nvme: fix discard support without oncs
blk-mq: directly poll requests
Linus Torvalds [Sat, 8 Apr 2023 18:34:17 +0000 (11:34 -0700)]
Merge tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Just two minor fixes for provided buffers - one where we could
potentially leak a buffer, and one where the returned values was
off-by-one in some cases"
* tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux:
io_uring: fix memory leak when removing provided buffers
io_uring: fix return value when removing provided buffers
Linus Torvalds [Sat, 8 Apr 2023 18:10:49 +0000 (11:10 -0700)]
Merge tag 'dma-mapping-6.3-2023-04-08' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- fix a braino in the swiotlb alignment check fix (Petr Tesarik)
* tag 'dma-mapping-6.3-2023-04-08' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix a braino in the alignment check fix
Linus Torvalds [Sat, 8 Apr 2023 18:02:03 +0000 (11:02 -0700)]
Merge tag 'trace-v6.3-rc5-2' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"A couple more minor fixes:
- Reset direct->addr back to its original value on error in updating
the direct trampoline code
- Make lastcmd_mutex static"
* tag 'trace-v6.3-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/synthetic: Make lastcmd_mutex static
ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
Linus Torvalds [Sat, 8 Apr 2023 17:51:12 +0000 (10:51 -0700)]
Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git./linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"28 hotfixes.
23 are cc:stable and the other five address issues which were
introduced during this merge cycle.
20 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: fix a potential concurrency bug in RCU mode
maple_tree: fix get wrong data_end in mtree_lookup_walk()
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
nilfs2: fix sysfs interface lifetime
mm: take a page reference when removing device exclusive entries
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
zsmalloc: document freeable stats
zsmalloc: document new fullness grouping
fsdax: force clear dirty mark if CoW
mm/hugetlb: fix uffd wr-protection for CoW optimization path
mm: enable maple tree RCU mode by default
maple_tree: add RCU lock checking to rcu callback functions
maple_tree: add smp_rmb() to dead node detection
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: detect dead nodes in mas_start()
maple_tree: be more cautious about dead nodes
...
Linus Torvalds [Fri, 7 Apr 2023 20:53:16 +0000 (13:53 -0700)]
Merge tag 'gpio-fixes-for-v6.3-rc6' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix irq handling in gpio-davinci
- fix Kconfig dependencies for gpio-regmap
* tag 'gpio-fixes-for-v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: davinci: Add irq chip flag to skip set wake
gpio: davinci: Do not clear the bank intr enable bit in save_context
gpio: GPIO_REGMAP: select REGMAP instead of depending on it
Linus Torvalds [Fri, 7 Apr 2023 20:32:54 +0000 (13:32 -0700)]
Merge tag 'acpi-6.3-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Fix the ACPI backlight override mechanism for the cases when
acpi_backlight=video is set through the kernel command line or a DMI
quirk and add backlight quirks for Apple iMac14,1 and iMac14,2 and
Lenovo ThinkPad W530 (Hans de Goede)"
* tag 'acpi-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
ACPI: video: Make acpi_backlight=video work independent from GPU driver
ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
Linus Torvalds [Fri, 7 Apr 2023 20:27:02 +0000 (13:27 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix uninitialised variable warning (from smatch) in the arm64 compat
alignment fixup code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: compat: Work around uninitialized variable warning
Linus Torvalds [Fri, 7 Apr 2023 20:10:23 +0000 (13:10 -0700)]
Merge tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
"Four fixes, three for stable:
- slab out of bounds fix
- lock cancellation fix
- minor cleanup to address clang warning
- fix for xfstest 551 (wrong parms passed to kvmalloc)"
* tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
ksmbd: delete asynchronous work from list
ksmbd: remove unused is_char_allowed function
ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
Dan Carpenter [Thu, 6 Apr 2023 08:55:47 +0000 (11:55 +0300)]
cifs: double lock in cifs_reconnect_tcon()
This lock was supposed to be an unlock.
Fixes: 6cc041e90c17 ("cifs: avoid races in parallel reconnects in smb1")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Yu Kuai [Wed, 22 Mar 2023 03:59:26 +0000 (11:59 +0800)]
block: don't set GD_NEED_PART_SCAN if scan partition failed
Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still
set, and partition scan will be proceed again when blkdev_get_by_dev()
is called. However, this will cause a problem that re-assemble partitioned
raid device will creat partition for underlying disk.
Test procedure:
mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0
sgdisk -n 0:0:+100MiB /dev/md0
blockdev --rereadpt /dev/sda
blockdev --rereadpt /dev/sdb
mdadm -S /dev/md0
mdadm -A /dev/md0 /dev/sda /dev/sdb
Test result: underlying disk partition and raid partition can be
observed at the same time
Note that this can still happen in come corner cases that
GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid
device.
Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again")
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Steven Rostedt (Google) [Thu, 6 Apr 2023 15:10:33 +0000 (11:10 -0400)]
tracing/synthetic: Make lastcmd_mutex static
The lastcmd_mutex is only used in trace_events_synth.c and should be
static.
Link: https://lore.kernel.org/linux-trace-kernel/202304062033.cRStgOuP-lkp@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230406111033.6e26de93@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Fixes: 4ccf11c4e8a8e ("tracing/synthetic: Fix races on freeing last_cmd")
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Thu, 6 Apr 2023 18:39:07 +0000 (11:39 -0700)]
Merge tag 'net-6.3-rc6-2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
- eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of
ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP
pkts"
* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: stmmac: check fwnode for phy device before scanning for phy
net: stmmac: Add queue reset into stmmac_xdp_open() function
selftests: net: rps_default_mask.sh: delete veth link specifically
net: fec: make use of MDIO C45 quirk
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
gve: Secure enough bytes in the first TX desc for all TCP pkts
netlink: annotate lockless accesses to nlk->max_recvmsg_len
ethtool: reset #lanes when lanes is omitted
ping: Fix potentail NULL deref for /proc/net/icmp.
raw: Fix NULL deref in raw_get_next().
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
net: stmmac: fix up RX flow hash indirection table when setting channels
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
...
Linus Torvalds [Thu, 6 Apr 2023 18:34:18 +0000 (11:34 -0700)]
Merge tag 'linux-kselftest-fixes-6.3-rc6' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"One single fix to mount_setattr_test build failure"
* tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests mount: Fix mount_setattr_test builds failed
Linus Torvalds [Thu, 6 Apr 2023 18:27:21 +0000 (11:27 -0700)]
Merge tag 'for-linus-iommufd' of git://git./linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
- An invalid VA range can be be put in a pages and eventually trigger
WARN_ON, reject it early
- Use of the wrong start index value when doing the complex batch carry
scheme
- Wrong store ordering resulting in corrupting data used in a later
calculation that corrupted the batch structure during carry
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Do not corrupt the pfn list when doing batch carry
iommufd: Fix unpinning of pages when an access is present
iommufd: Check for uptr overflow
Linus Torvalds [Thu, 6 Apr 2023 18:08:03 +0000 (11:08 -0700)]
Merge tag 'pwm/for-6.3-rc6' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm fixes from Thierry Reding:
"These are some fixes to make sure the PWM state structure is always
initialized to a known state.
Prior to this it could happen in some situations that random data from
the stack would leak into the data structure and cause subtle bugs"
* tag 'pwm/for-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Zero-initialize the pwm_state passed to driver's .get_state()
pwm: meson: Explicitly set .polarity in .get_state()
pwm: sprd: Explicitly set .polarity in .get_state()
pwm: iqs620a: Explicitly set .polarity in .get_state()
pwm: cros-ec: Explicitly set .polarity in .get_state()
pwm: hibvt: Explicitly set .polarity in .get_state()
Linus Torvalds [Thu, 6 Apr 2023 17:25:27 +0000 (10:25 -0700)]
Merge tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"Mostly i915 fixes: dp mst for compression/dsc, perf ioctl uaf, ctx rpm
accounting, gt reset vs huc loading.
And a few individual driver fixes: ivpu dma fence&suspend, panfrost
mmap, nouveau color depth"
* tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm:
accel/ivpu: Fix S3 system suspend when not idle
accel/ivpu: Add dma fence to command buffers only
drm/i915: Fix context runtime accounting
drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
drm/i915: Use compressed bpp when calculating m/n value for DP MST DSC
drm/i915/huc: Cancel HuC delayed load timer on reset.
drm/i915/ttm: fix sparse warning
drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
drm/nouveau/disp: Support more modes by checking with lower bpc
Linus Torvalds [Thu, 6 Apr 2023 17:19:30 +0000 (10:19 -0700)]
Merge tag 'sound-6.3-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The majority of changes here are various fixes for Intel drivers,
and there is a change in ASoC PCM core for the format constraints.
In addition, a workaround for HD-audio HDMI regressions and usual
HD-audio quirks are found"
* tag 'sound-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi: Preserve the previous PCM device upon re-enablement
ALSA: hda/realtek: Add quirk for Clevo X370SNW
ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
ASoC: SOF: avoid a NULL dereference with unsupported widgets
ASoC: da7213.c: add missing pm_runtime_disable()
ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
ASoC: codecs: lpass: fix the order or clks turn off during suspend
ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
ASoC: SOF: ipc4: Ensure DSP is in D0I0 during sof_ipc4_set_get_data()
ASoC: amd: yc: Add DMI entries to support Victus by HP Laptop 16-e1xxx (8A22)
ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
ASoC: Intel: soc-acpi: add table for Intel 'Rooks County' NUC M15
ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
Linus Torvalds [Thu, 6 Apr 2023 17:13:23 +0000 (10:13 -0700)]
Merge tag 'platform-drivers-x86-v6.3-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- more think-lmi fixes
- one DMI quirk addition
* tag 'platform-drivers-x86-v6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Add missing T14s Gen1 type to s2idle quirk list
platform/x86: think-lmi: Clean up display of current_value on Thinkstation
platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
platform/x86: think-lmi: Fix memory leak when showing current settings
Linus Torvalds [Thu, 6 Apr 2023 16:51:04 +0000 (09:51 -0700)]
Merge tag 'asm-generic-fixes-6.3' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"These are minor fixes to address false-positive build warnings:
Some of the less common I/O accessors are missing __force casts and
cause sparse warnings for their implied byteswap, and a recent change
to __generic_cmpxchg_local() causes a warning about constant integer
truncation"
* tag 'asm-generic-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: avoid __generic_cmpxchg_local warnings
asm-generic/io.h: suppress endianness warnings for relaxed accessors
asm-generic/io.h: suppress endianness warnings for readq() and writeq()
Michael Sit Wei Hong [Thu, 6 Apr 2023 02:45:41 +0000 (10:45 +0800)]
net: stmmac: check fwnode for phy device before scanning for phy
Some DT devices already have phy device configured in the DT/ACPI.
Current implementation scans for a phy unconditionally even though
there is a phy listed in the DT/ACPI and already attached.
We should check the fwnode if there is any phy device listed in
fwnode and decide whether to scan for a phy to attach to.
Fixes: fe2cfbc96803 ("net: stmmac: check if MAC needs to attach to a PHY")
Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/lkml/20230403212434.296975-1-martin.blumenstingl@googlemail.com/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Shahab Vahedi <shahab@synopsys.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Link: https://lore.kernel.org/r/20230406024541.3556305-1-michael.wei.hong.sit@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zheng Yejian [Thu, 30 Mar 2023 02:52:23 +0000 (10:52 +0800)]
ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().
Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but
not restored if error happened on calling ftrace_modify_direct_caller().
Then it can no longer find 'direct' by that 'old_addr'.
To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.
Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <ast@kernel.org>
Cc: <daniel@iogearbox.net>
Fixes: 8a141dd7f706 ("ftrace: Fix modify_ftrace_direct.")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Petr Tesarik [Thu, 6 Apr 2023 14:35:39 +0000 (16:35 +0200)]
swiotlb: fix a braino in the alignment check fix
The alignment mask in swiotlb_do_find_slots() masks off the high
bits which are not relevant for the alignment, so multiple
requirements are combined with a bitwise OR rather than AND.
In plain English, the stricter the alignment, the more bits must
be set in iotlb_align_mask.
Confusion may arise from the fact that the same variable is also
used to mask off the offset within a swiotlb slot, which is
achieved with a bitwise AND.
Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks")
Reported-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/all/CAA42JLa1y9jJ7BgQvXeUYQh-K2mDNHd2BYZ4iZUz33r5zY7oAQ@mail.gmail.com/
Reported-by: Kelsey Steele <kelseysteele@linux.microsoft.com>
Link: https://lore.kernel.org/all/20230405003549.GA21326@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Jens Axboe [Thu, 6 Apr 2023 14:12:19 +0000 (08:12 -0600)]
Merge tag 'nvme-6.3-2023-04-06' of git://git.infradead.org/nvme into block-6.3
Pull NVMe fix from Christoph:
"nvme fixes for Linux 6.3
- fix discard support without oncs (Keith Busch)"
* tag 'nvme-6.3-2023-04-06' of git://git.infradead.org/nvme:
nvme: fix discard support without oncs
Ming Lei [Thu, 6 Apr 2023 12:40:59 +0000 (20:40 +0800)]
block: ublk: make sure that block size is set correctly
block size is one very key setting for block layer, and bad block size
could panic kernel easily.
Make sure that block size is set correctly.
Meantime if ublk_validate_params() fails, clear ub->params so that disk
is prevented from being added.
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Reported-and-tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 6 Apr 2023 02:00:46 +0000 (20:00 -0600)]
ublk: read any SQE values upfront
Since SQE memory is shared with userspace, we should only be reading it
once. We cannot read it multiple times, particularly when it's read once
for validation and then read again for the actual use.
ublk_ch_uring_cmd() is safe when called as a retry operation, as the
memory backing is stable at that point. But for normal issue, we want
to ensure that we only read ublksrv_io_cmd once. Wrap the function in
a helper that reads the value into an on-stack copy of the struct.
Cc: stable@vger.kernel.org # 6.0+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Song Yoong Siang [Tue, 4 Apr 2023 04:48:23 +0000 (12:48 +0800)]
net: stmmac: Add queue reset into stmmac_xdp_open() function
Queue reset was moved out from __init_dma_rx_desc_rings() and
__init_dma_tx_desc_rings() functions. Thus, the driver fails to transmit
and receive packet after XDP prog setup.
This commit adds the missing queue reset into stmmac_xdp_open() function.
Fixes: f9ec5723c3db ("net: ethernet: stmicro: stmmac: move queue reset to dedicated functions")
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230404044823.3226144-1-yoong.siang.song@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Tue, 4 Apr 2023 07:24:11 +0000 (15:24 +0800)]
selftests: net: rps_default_mask.sh: delete veth link specifically
When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.
Before this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
RTNETLINK answers: File exists
changing rps_default_mask in child ns don't affect the main one[ ok ]
cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
[fail] expected 1 found
changing rps_default_mask in child ns don't affect existing devices[ ok ]
After this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
changing rps_default_mask in child ns don't affect the main one[ ok ]
changing rps_default_mask in child ns affects new childns devices[ ok ]
changing rps_default_mask in child ns don't affect existing devices[ ok ]
Fixes: 3a7d84eae03b ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Greg Ungerer [Tue, 4 Apr 2023 05:22:07 +0000 (15:22 +1000)]
net: fec: make use of MDIO C45 quirk
Not all fec MDIO bus drivers support C45 mode transactions. The older fec
hardware block in many ColdFire SoCs does not appear to support them, at
least according to most of the different ColdFire SoC reference manuals.
The bits used to generate C45 access on the iMX parts, in the OP field
of the MMFR register, are documented as generating non-compliant MII
frames (it is not documented as to exactly how they are non-compliant).
Commit
8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
means the fec driver will always register c45 MDIO read and write
methods. During probe these will always be accessed now generating
non-compliant MII accesses on ColdFire based devices.
Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to
distinguish silicon that supports MDIO C45 framing or not. Add this to
all the existing iMX quirks, so they will be behave as they do now (*).
(*) it seems that some iMX parts may not support C45 transactions either.
The iMX25 and iMX50 Reference Manuals contain similar wording to
the ColdFire Reference Manuals on this.
Fixes: 8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230404052207.3064861-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Peng Zhang [Tue, 14 Mar 2023 12:42:03 +0000 (20:42 +0800)]
maple_tree: fix a potential concurrency bug in RCU mode
There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.
CPU1:
mtree_insert_range()
mas_insert()
mas_store_root()
...
mas_root_expand()
...
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
ma_set_meta(node, maple_leaf_64, 0, slot); <---IP
CPU2:
mtree_load()
mtree_lookup_walk()
ma_data_end();
When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.
An example of triggering the bug:
Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().
static DEFINE_MTREE(tree);
int work(void *p) {
unsigned long val;
for (int i = 0 ; i< 30; ++i) {
val = (unsigned long)mtree_load(&tree, 8);
mdelay(5);
pr_info("%lu",val);
}
return 0;
}
mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL. Fix
it by put ma_set_meta() before rcu_assign_pointer().
Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Zhang [Tue, 14 Mar 2023 12:42:01 +0000 (20:42 +0800)]
maple_tree: fix get wrong data_end in mtree_lookup_walk()
if (likely(offset > end))
max = pivots[offset];
The above code should be changed to if (likely(offset < end)), which is
correct. This affects the correctness of ma_data_end(). Now it seems
that the final result will not be wrong, but it is best to change it.
This patch does not change the code as above, because it simplifies the
code by the way.
Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rongwei Wang [Tue, 4 Apr 2023 15:47:16 +0000 (23:47 +0800)]
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption. The only place we have found where this
happens is in the swapoff path. This case can be described as below:
core 0 core 1
swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock
and re-add si into
swap_avail_head
acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.
It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the above scene. In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap. This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
------------[ cut here ]------------
top:
00000000e58a3003, n:
0000000013e75cda, p:
000000008cd4451a
prev:
0000000035b1e58a, n:
000000008cd4451a, p:
000000002150ee8d
next:
000000008cd4451a, n:
000000008cd4451a, p:
000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate:
60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp :
ffff0018009d3c30
x29:
ffff0018009d3c40 x28:
ffff800011b32a98
x27:
0000000000000000 x26:
ffff001803908000
x25:
ffff8000128ea088 x24:
ffff800011b32a48
x23:
0000000000000028 x22:
ffff001800875c00
x21:
ffff800010f9e520 x20:
ffff001800875c00
x19:
ffff001800fdc6e0 x18:
0000000000000030
x17:
0000000000000000 x16:
0000000000000000
x15:
0736076307640766 x14:
0730073007380731
x13:
0736076307640766 x12:
0730073007380731
x11:
000000000004058d x10:
0000000085a85b76
x9 :
ffff8000101436e4 x8 :
ffff800011c8ce08
x7 :
0000000000000000 x6 :
0000000000000001
x5 :
ffff0017df9ed338 x4 :
0000000000000001
x3 :
ffff8017ce62a000 x2 :
ffff0017df9ed340
x1 :
0000000000000000 x0 :
0000000000000000
Call trace:
plist_check_prev_next_node+0x50/0x70
plist_check_head+0x80/0xf0
plist_add+0x28/0x140
add_to_avail_list+0x9c/0xf0
_enable_swap_info+0x78/0xb4
__do_sys_swapon+0x918/0xa10
__arm64_sys_swapon+0x20/0x30
el0_svc_common+0x8c/0x220
do_el0_svc+0x2c/0x90
el0_svc+0x1c/0x30
el0_sync_handler+0xa8/0xb0
el0_sync+0x148/0x180
irq event stamp:
2082270
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.
This problem exists in versions after stable 5.10.y.
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Thu, 30 Mar 2023 20:55:15 +0000 (05:55 +0900)]
nilfs2: fix sysfs interface lifetime
The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.
Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.
Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.
In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".
Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking. This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".
This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.
Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com
Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com
Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alistair Popple [Thu, 30 Mar 2023 01:25:19 +0000 (12:25 +1100)]
mm: take a page reference when removing device exclusive entries
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yafang Shao [Thu, 30 Mar 2023 16:26:25 +0000 (16:26 +0000)]
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
There're some suspicious warn_alloc on my test serer, for example,
[13366.518837] warn_alloc: 81 callbacks suppressed
[13366.518841] test_verifier: vmalloc error: size 4096, page order 0, failed to allocate pages, mode:0x500dc2(GFP_HIGHUSER|__GFP_ZERO|__GFP_ACCOUNT), nodemask=(null),cpuset=/,mems_allowed=0-1
[13366.522240] CPU: 30 PID: 722463 Comm: test_verifier Kdump: loaded Tainted: G W O 6.2.0+ #638
[13366.524216] Call Trace:
[13366.524702] <TASK>
[13366.525148] dump_stack_lvl+0x6c/0x80
[13366.525712] dump_stack+0x10/0x20
[13366.526239] warn_alloc+0x119/0x190
[13366.526783] ? alloc_pages_bulk_array_mempolicy+0x9e/0x2a0
[13366.527470] __vmalloc_area_node+0x546/0x5b0
[13366.528066] __vmalloc_node_range+0xc2/0x210
[13366.528660] __vmalloc_node+0x42/0x50
[13366.529186] ? bpf_prog_realloc+0x53/0xc0
[13366.529743] __vmalloc+0x1e/0x30
[13366.530235] bpf_prog_realloc+0x53/0xc0
[13366.530771] bpf_patch_insn_single+0x80/0x1b0
[13366.531351] bpf_jit_blind_constants+0xe9/0x1c0
[13366.531932] ? __free_pages+0xee/0x100
[13366.532457] ? free_large_kmalloc+0x58/0xb0
[13366.533002] bpf_int_jit_compile+0x8c/0x5e0
[13366.533546] bpf_prog_select_runtime+0xb4/0x100
[13366.534108] bpf_prog_load+0x6b1/0xa50
[13366.534610] ? perf_event_task_tick+0x96/0xb0
[13366.535151] ? security_capable+0x3a/0x60
[13366.535663] __sys_bpf+0xb38/0x2190
[13366.536120] ? kvm_clock_get_cycles+0x9/0x10
[13366.536643] __x64_sys_bpf+0x1c/0x30
[13366.537094] do_syscall_64+0x38/0x90
[13366.537554] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[13366.538107] RIP: 0033:0x7f78310f8e29
[13366.538561] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48
[13366.540286] RSP: 002b:
00007ffe2a61fff8 EFLAGS:
00000206 ORIG_RAX:
0000000000000141
[13366.541031] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f78310f8e29
[13366.541749] RDX:
0000000000000080 RSI:
00007ffe2a6200b0 RDI:
0000000000000005
[13366.542470] RBP:
00007ffe2a620010 R08:
00007ffe2a6202a0 R09:
00007ffe2a6200b0
[13366.543183] R10:
00000000000f423e R11:
0000000000000206 R12:
0000000000407800
[13366.543900] R13:
00007ffe2a620540 R14:
0000000000000000 R15:
0000000000000000
[13366.544623] </TASK>
[13366.545260] Mem-Info:
[13366.546121] active_anon:81319 inactive_anon:20733 isolated_anon:0
active_file:69450 inactive_file:5624 isolated_file:0
unevictable:0 dirty:10 writeback:0
slab_reclaimable:69649 slab_unreclaimable:48930
mapped:27400 shmem:12868 pagetables:4929
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:
15870308 free_pcp:142935 free_cma:0
[13366.551886] Node 0 active_anon:224836kB inactive_anon:33528kB active_file:175692kB inactive_file:13752kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:59248kB dirty:32kB writeback:0kB shmem:18252kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4616kB pagetables:10664kB sec_pagetables:0kB all_unreclaimable? no
[13366.555184] Node 1 active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:50352kB dirty:8kB writeback:0kB shmem:33220kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3896kB pagetables:9052kB sec_pagetables:0kB all_unreclaimable? no
[13366.558262] Node 0 DMA free:15360kB boost:0kB min:304kB low:380kB high:456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[13366.560821] lowmem_reserve[]: 0 2735 31873 31873 31873
[13366.561981] Node 0 DMA32 free:2790904kB boost:0kB min:56028kB low:70032kB high:84036kB reserved_highatomic:0KB active_anon:1936kB inactive_anon:20kB active_file:396kB inactive_file:344kB unevictable:0kB writepending:0kB present:3129200kB managed:2801520kB mlocked:0kB bounce:0kB free_pcp:5188kB local_pcp:0kB free_cma:0kB
[13366.565148] lowmem_reserve[]: 0 0 29137 29137 29137
[13366.566168] Node 0 Normal free:28533824kB boost:0kB min:596740kB low:745924kB high:895108kB reserved_highatomic:28672KB active_anon:222900kB inactive_anon:33508kB active_file:175296kB inactive_file:13408kB unevictable:0kB writepending:32kB present:30408704kB managed:29837172kB mlocked:0kB bounce:0kB free_pcp:295724kB local_pcp:0kB free_cma:0kB
[13366.569485] lowmem_reserve[]: 0 0 0 0 0
[13366.570416] Node 1 Normal free:32141144kB boost:0kB min:660504kB low:825628kB high:990752kB reserved_highatomic:69632KB active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB writepending:8kB present:33554432kB managed:33025372kB mlocked:0kB bounce:0kB free_pcp:270880kB local_pcp:46860kB free_cma:0kB
[13366.573403] lowmem_reserve[]: 0 0 0 0 0
[13366.574015] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
[13366.575474] Node 0 DMA32: 782*4kB (UME) 756*8kB (UME) 736*16kB (UME) 745*32kB (UME) 694*64kB (UME) 653*128kB (UME) 595*256kB (UME) 552*512kB (UME) 454*1024kB (UME) 347*2048kB (UME) 246*4096kB (UME) = 2790904kB
[13366.577442] Node 0 Normal: 33856*4kB (UMEH) 51815*8kB (UMEH) 42418*16kB (UMEH) 36272*32kB (UMEH) 22195*64kB (UMEH) 10296*128kB (UMEH) 7238*256kB (UMEH) 5638*512kB (UEH) 5337*1024kB (UMEH) 3506*2048kB (UMEH) 1470*4096kB (UME) = 28533784kB
[13366.580460] Node 1 Normal: 15776*4kB (UMEH) 37485*8kB (UMEH) 29509*16kB (UMEH) 21420*32kB (UMEH) 14818*64kB (UMEH) 13051*128kB (UMEH) 9918*256kB (UMEH) 7374*512kB (UMEH) 5397*1024kB (UMEH) 3887*2048kB (UMEH) 2002*4096kB (UME) = 32141240kB
[13366.583027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[13366.584380] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[13366.585702] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[13366.587042] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[13366.588372] 87386 total pagecache pages
[13366.589266] 0 pages in swap cache
[13366.590327] Free swap = 0kB
[13366.591227] Total swap = 0kB
[13366.592142]
16777082 pages RAM
[13366.593057] 0 pages HighMem/MovableOnly
[13366.594037] 357226 pages reserved
[13366.594979] 0 pages hwpoisoned
This failure really confuse me as there're still lots of available pages.
Finally I figured out it was caused by a fatal signal. When a process is
allocating memory via vm_area_alloc_pages(), it will break directly even
if it hasn't allocated the requested pages when it receives a fatal
signal. In that case, we shouldn't show this warn_alloc, as it is
useless. We only need to show this warning when there're really no enough
pages.
Link: https://lkml.kernel.org/r/20230330162625.13604-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tetsuo Handa [Sun, 26 Mar 2023 15:21:46 +0000 (00:21 +0900)]
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing
"struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when
being passed to CRC function.
Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com
Reported-by: syzbot <syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b
Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com>
Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_pg@mail.gmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Mon, 27 Mar 2023 17:53:18 +0000 (02:53 +0900)]
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
The finalization of nilfs_segctor_thread() can race with
nilfs_segctor_kill_thread() which terminates that thread, potentially
causing a use-after-free BUG as KASAN detected.
At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member
of "struct nilfs_sc_info" to indicate the thread has finished, and then
notifies nilfs_segctor_kill_thread() of this using waitqueue
"sc_wait_task" on the struct nilfs_sc_info.
However, here, immediately after the NULL assignment to "sc_task", it is
possible that nilfs_segctor_kill_thread() will detect it and return to
continue the deallocation, freeing the nilfs_sc_info structure before the
thread does the notification.
This fixes the issue by protecting the NULL assignment to "sc_task" and
its notification, with spinlock "sc_state_lock" of the struct
nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to
see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate
the race.
Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sergey Senozhatsky [Sat, 25 Mar 2023 02:46:31 +0000 (11:46 +0900)]
zsmalloc: document freeable stats
When freeable class stat was added to classes file (back in 2016) we
forgot to update zsmalloc documentation. Fix that.
Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org
Fixes: 1120ed548394 ("mm/zsmalloc: add `freeable' column to pool stat")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sergey Senozhatsky [Sat, 25 Mar 2023 02:46:30 +0000 (11:46 +0900)]
zsmalloc: document new fullness grouping
Patch series "zsmalloc: minor documentation updates".
Two minor patches that bring zsmalloc documentation up to date.
This patch (of 2):
Update documentation and reflect new zspages fullness grouping (we don't
use almost_empty and almost_full anymore).
Link: https://lkml.kernel.org/r/20230325024631.2817153-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20230325024631.2817153-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Fixes: 67e157eb3639 ("zsmalloc: show per fullness group class stats")
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shiyang Ruan [Fri, 24 Mar 2023 10:28:00 +0000 (10:28 +0000)]
fsdax: force clear dirty mark if CoW
XFS allows CoW on non-shared extents to combat fragmentation[1]. The old
non-shared extent could be mwrited before, its dax entry is marked dirty.
This results in a WARNing:
[ 28.512349] ------------[ cut here ]------------
[ 28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390
[ 28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
[ 28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted
6.3.0-rc1-00001-g85e1481e19c1-dirty #117
[ 28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014
[ 28.516307] RIP: 0010:dax_insert_entry+0x342/0x390
[ 28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48
[ 28.517417] RSP: 0000:
ffffc9000845fb18 EFLAGS:
00010086
[ 28.517721] RAX:
0000000000000053 RBX:
0000000000000155 RCX:
000000000018824b
[ 28.518113] RDX:
0000000000000000 RSI:
ffffffff827525a6 RDI:
00000000ffffffff
[ 28.518515] RBP:
ffffea00062092c0 R08:
0000000000000000 R09:
ffffc9000845f9c8
[ 28.518905] R10:
0000000000000003 R11:
ffffffff82ddb7e8 R12:
0000000000000155
[ 28.519301] R13:
0000000000000000 R14:
000000000018824b R15:
ffff88810cfa76b8
[ 28.519703] FS:
00007f14a0c94740(0000) GS:
ffff88817bd00000(0000) knlGS:
0000000000000000
[ 28.520148] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 28.520472] CR2:
00007f14a0c8d000 CR3:
000000010321c004 CR4:
0000000000770ee0
[ 28.520863] PKRU:
55555554
[ 28.521043] Call Trace:
[ 28.521219] <TASK>
[ 28.521368] dax_fault_iter+0x196/0x390
[ 28.521595] dax_iomap_pte_fault+0x19b/0x3d0
[ 28.521852] __xfs_filemap_fault+0x234/0x2b0
[ 28.522116] __do_fault+0x30/0x130
[ 28.522334] do_fault+0x193/0x340
[ 28.522586] __handle_mm_fault+0x2d3/0x690
[ 28.522975] handle_mm_fault+0xe6/0x2c0
[ 28.523259] do_user_addr_fault+0x1bc/0x6f0
[ 28.523521] exc_page_fault+0x60/0x140
[ 28.523763] asm_exc_page_fault+0x22/0x30
[ 28.524001] RIP: 0033:0x7f14a0b589ca
[ 28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[ 28.525198] RSP: 002b:
00007fff1dea1c98 EFLAGS:
00010202
[ 28.525505] RAX:
000000000000001e RBX:
000000000014a000 RCX:
0000000000006046
[ 28.525895] RDX:
00007f14a0c82000 RSI:
000000000000001e RDI:
00007f14a0c8d000
[ 28.526290] RBP:
000000000000006f R08:
0000000000000004 R09:
000000000014a000
[ 28.526681] R10:
0000000000000008 R11:
0000000000000246 R12:
028f5c28f5c28f5c
[ 28.527067] R13:
8f5c28f5c28f5c29 R14:
0000000000011046 R15:
00007f14a0c946c0
[ 28.527449] </TASK>
[ 28.527600] ---[ end trace
0000000000000000 ]---
To be able to delete this entry, clear its dirty mark before
invalidate_inode_pages2_range().
[1] https://lore.kernel.org/linux-xfs/
20230321151339.GA11376@frogsfrogsfrogs/
Link: https://lkml.kernel.org/r/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: f80e1668888f3 ("fsdax: invalidate pages when CoW")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Xu [Tue, 21 Mar 2023 19:18:40 +0000 (15:18 -0400)]
mm/hugetlb: fix uffd wr-protection for CoW optimization path
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't
need. However there's one CoW optimization path that can trigger
hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected. The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g. when a present pte was
wr-protected) will resolve the uffd-wp bit already. Also make sure
anonymous UNSHARE won't be affected and can still be resolved, IOW only
skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
[peterx@redhat.com: v2]
Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n
[peterx@redhat.com: v3]
Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Mon, 27 Feb 2023 17:36:07 +0000 (09:36 -0800)]
mm: enable maple tree RCU mode by default
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space. Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
[Liam.Howlett@Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+8d95422d3537159ca390@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Mon, 27 Feb 2023 17:36:06 +0000 (09:36 -0800)]
maple_tree: add RCU lock checking to rcu callback functions
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain. Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.
Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree. Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking. It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.
Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Mon, 27 Feb 2023 17:36:05 +0000 (09:36 -0800)]
maple_tree: add smp_rmb() to dead node detection
Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.
The is necessary for RCU mode.
Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Mon, 27 Feb 2023 17:36:04 +0000 (09:36 -0800)]
maple_tree: fix write memory barrier of nodes once dead for RCU mode
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes. To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.
There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead. Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.
Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.
This is necessary for the RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam Howlett [Mon, 27 Feb 2023 17:36:03 +0000 (09:36 -0800)]
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed. This is an optimization for the RCU mode
of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam Howlett [Mon, 27 Feb 2023 17:36:02 +0000 (09:36 -0800)]
maple_tree: fix freeing of nodes in rcu mode
The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes.
Avoid this by setting the correct node types. This is necessary for the
RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-4-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam Howlett [Mon, 27 Feb 2023 17:36:01 +0000 (09:36 -0800)]
maple_tree: detect dead nodes in mas_start()
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode. Detect and restart the walk if
this is the case. This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam Howlett [Mon, 27 Feb 2023 17:36:00 +0000 (09:36 -0800)]
maple_tree: be more cautious about dead nodes
Patch series "Fix VMA tree modification under mmap read lock".
Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree. The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.
This only happens for stack VMAs. We had thought this was safe as it only
modifies a single pivot in the tree. Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA. This causes us to delete the NULL entry between the two VMAs
and rewrite the node.
We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node. We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.
These patches have been in -next since next-
20230301, and have received
intensive testing in Android as part of the RCU page fault patchset. They
were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.
Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes. This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.
This patch (of 8):
ma_pivots() and ma_data_end() may be called with a dead node. Ensure to
that the node isn't dead before using the returned values.
This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chriscli@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: freak07 <michalechner92@googlemail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jakub Kicinski [Thu, 6 Apr 2023 00:24:26 +0000 (17:24 -0700)]
Merge tag 'wireless-2023-04-05' of git://git./linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.3
mt76 has a fix for leaking cleartext frames on a certain scenario and
two firmware file handling related fixes. For brcmfmac we have a fix
for an older SDIO suspend regression and for ath11k avoiding a kernel
crash during hibernation with SUSE kernels.
* tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
wifi: mt76: mt7921: fix fw used for offload check for mt7922
wifi: mt76: mt7921: Fix use-after-free in fw features query.
wifi: brcmfmac: Fix SDIO suspend/resume regression
====================
Link: https://lore.kernel.org/r/20230405105536.4E946C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 6 Apr 2023 00:22:06 +0000 (17:22 -0700)]
Merge tag 'linux-can-fixes-for-6.3-
20230405' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2023-04-05
The first patch is by Oleksij Rempel and fixes a out-of-bounds memory
access in the j1939 protocol.
The remaining 3 patches target the ISOTP protocol. Oliver Hartkopp
fixes the ISOTP protocol to pass information about dropped PDUs to the
user space via control messages. Michal Sojka's patch fixes poll() to
not forward false EPOLLOUT events. And Oliver Hartkopp fixes a race
condition between isotp_sendsmg() and isotp_release().
* tag 'linux-can-fixes-for-6.3-
20230405' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
====================
Link: https://lore.kernel.org/r/20230405092444.1802340-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>