From: Linus Torvalds Date: Sun, 19 May 2024 16:21:03 +0000 (-0700) Subject: Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel... X-Git-Url: http://git.maquefel.me/?a=commitdiff_plain;h=61307b7be41a1f1039d1d1368810a1d92cb97b44;p=linux.git Merge tag 'mm-stable-2024-05-17-19-19' of git://git./linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ... --- 61307b7be41a1f1039d1d1368810a1d92cb97b44 diff --cc arch/alpha/lib/checksum.c index 17fa230baeef6,c29b98ef9c826..27b2a9edf3ccd --- a/arch/alpha/lib/checksum.c +++ b/arch/alpha/lib/checksum.c @@@ -12,9 -12,9 +12,10 @@@ #include #include +#include #include + #include static inline unsigned short from64to16(unsigned long x) { diff --cc arch/alpha/lib/fpreg.c index eee11fb4c7f15,3d32165043f8b..9a238e7536aec --- a/arch/alpha/lib/fpreg.c +++ b/arch/alpha/lib/fpreg.c @@@ -8,8 -8,8 +8,9 @@@ #include #include #include + #include #include +#include #if defined(CONFIG_ALPHA_EV6) || defined(CONFIG_ALPHA_EV67) #define STT(reg,val) asm volatile ("ftoit $f"#reg",%0" : "=r"(val)); diff --cc arch/arm64/include/asm/pgtable.h index bde9fd1793887,1303d30287dce..f8efbc128446d --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@@ -560,19 -502,15 +554,17 @@@ static inline int pmd_trans_huge(pmd_t #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) - -static inline pmd_t pmd_mkinvalid(pmd_t pmd) -{ - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); - - return pmd; -} +#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd))) +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd)) +#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))) +#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))) +#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd)) +#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))) +#define pmd_swp_clear_uffd_wp(pmd) \ + pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))) +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ - #define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) - #define pmd_write(pmd) pte_write(pmd_pte(pmd)) #define pmd_mkhuge(pmd) (__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT)) diff --cc arch/powerpc/mm/mem.c index 22cc978cdb474,a197d4c2244b3..d325217ab2012 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@@ -16,7 -16,7 +16,8 @@@ #include #include #include +#include + #include #include #include diff --cc drivers/gpu/drm/mediatek/mtk_gem.c index 5a82d7cf3ed0d,0000000000000..a172456d1d7be mode 100644,000000..100644 --- a/drivers/gpu/drm/mediatek/mtk_gem.c +++ b/drivers/gpu/drm/mediatek/mtk_gem.c @@@ -1,287 -1,0 +1,288 @@@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2015 MediaTek Inc. + */ + +#include ++#include + +#include +#include +#include +#include +#include + +#include "mtk_drm_drv.h" +#include "mtk_gem.h" + +static int mtk_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma); + +static const struct vm_operations_struct vm_ops = { + .open = drm_gem_vm_open, + .close = drm_gem_vm_close, +}; + +static const struct drm_gem_object_funcs mtk_gem_object_funcs = { + .free = mtk_gem_free_object, + .get_sg_table = mtk_gem_prime_get_sg_table, + .vmap = mtk_gem_prime_vmap, + .vunmap = mtk_gem_prime_vunmap, + .mmap = mtk_gem_object_mmap, + .vm_ops = &vm_ops, +}; + +static struct mtk_gem_obj *mtk_gem_init(struct drm_device *dev, + unsigned long size) +{ + struct mtk_gem_obj *mtk_gem_obj; + int ret; + + size = round_up(size, PAGE_SIZE); + + if (size == 0) + return ERR_PTR(-EINVAL); + + mtk_gem_obj = kzalloc(sizeof(*mtk_gem_obj), GFP_KERNEL); + if (!mtk_gem_obj) + return ERR_PTR(-ENOMEM); + + mtk_gem_obj->base.funcs = &mtk_gem_object_funcs; + + ret = drm_gem_object_init(dev, &mtk_gem_obj->base, size); + if (ret < 0) { + DRM_ERROR("failed to initialize gem object\n"); + kfree(mtk_gem_obj); + return ERR_PTR(ret); + } + + return mtk_gem_obj; +} + +struct mtk_gem_obj *mtk_gem_create(struct drm_device *dev, + size_t size, bool alloc_kmap) +{ + struct mtk_drm_private *priv = dev->dev_private; + struct mtk_gem_obj *mtk_gem; + struct drm_gem_object *obj; + int ret; + + mtk_gem = mtk_gem_init(dev, size); + if (IS_ERR(mtk_gem)) + return ERR_CAST(mtk_gem); + + obj = &mtk_gem->base; + + mtk_gem->dma_attrs = DMA_ATTR_WRITE_COMBINE; + + if (!alloc_kmap) + mtk_gem->dma_attrs |= DMA_ATTR_NO_KERNEL_MAPPING; + + mtk_gem->cookie = dma_alloc_attrs(priv->dma_dev, obj->size, + &mtk_gem->dma_addr, GFP_KERNEL, + mtk_gem->dma_attrs); + if (!mtk_gem->cookie) { + DRM_ERROR("failed to allocate %zx byte dma buffer", obj->size); + ret = -ENOMEM; + goto err_gem_free; + } + + if (alloc_kmap) + mtk_gem->kvaddr = mtk_gem->cookie; + + DRM_DEBUG_DRIVER("cookie = %p dma_addr = %pad size = %zu\n", + mtk_gem->cookie, &mtk_gem->dma_addr, + size); + + return mtk_gem; + +err_gem_free: + drm_gem_object_release(obj); + kfree(mtk_gem); + return ERR_PTR(ret); +} + +void mtk_gem_free_object(struct drm_gem_object *obj) +{ + struct mtk_gem_obj *mtk_gem = to_mtk_gem_obj(obj); + struct mtk_drm_private *priv = obj->dev->dev_private; + + if (mtk_gem->sg) + drm_prime_gem_destroy(obj, mtk_gem->sg); + else + dma_free_attrs(priv->dma_dev, obj->size, mtk_gem->cookie, + mtk_gem->dma_addr, mtk_gem->dma_attrs); + + /* release file pointer to gem object. */ + drm_gem_object_release(obj); + + kfree(mtk_gem); +} + +int mtk_gem_dumb_create(struct drm_file *file_priv, struct drm_device *dev, + struct drm_mode_create_dumb *args) +{ + struct mtk_gem_obj *mtk_gem; + int ret; + + args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8); + + /* + * Multiply 2 variables of different types, + * for example: args->size = args->spacing * args->height; + * may cause coverity issue with unintentional overflow. + */ + args->size = args->pitch; + args->size *= args->height; + + mtk_gem = mtk_gem_create(dev, args->size, false); + if (IS_ERR(mtk_gem)) + return PTR_ERR(mtk_gem); + + /* + * allocate a id of idr table where the obj is registered + * and handle has the id what user can see. + */ + ret = drm_gem_handle_create(file_priv, &mtk_gem->base, &args->handle); + if (ret) + goto err_handle_create; + + /* drop reference from allocate - handle holds it now. */ + drm_gem_object_put(&mtk_gem->base); + + return 0; + +err_handle_create: + mtk_gem_free_object(&mtk_gem->base); + return ret; +} + +static int mtk_gem_object_mmap(struct drm_gem_object *obj, + struct vm_area_struct *vma) + +{ + int ret; + struct mtk_gem_obj *mtk_gem = to_mtk_gem_obj(obj); + struct mtk_drm_private *priv = obj->dev->dev_private; + + /* + * Set vm_pgoff (used as a fake buffer offset by DRM) to 0 and map the + * whole buffer from the start. + */ + vma->vm_pgoff = 0; + + /* + * dma_alloc_attrs() allocated a struct page table for mtk_gem, so clear + * VM_PFNMAP flag that was set by drm_gem_mmap_obj()/drm_gem_mmap(). + */ + vm_flags_set(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP); + vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags)); + vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot); + + ret = dma_mmap_attrs(priv->dma_dev, vma, mtk_gem->cookie, + mtk_gem->dma_addr, obj->size, mtk_gem->dma_attrs); + + return ret; +} + +/* + * Allocate a sg_table for this GEM object. + * Note: Both the table's contents, and the sg_table itself must be freed by + * the caller. + * Returns a pointer to the newly allocated sg_table, or an ERR_PTR() error. + */ +struct sg_table *mtk_gem_prime_get_sg_table(struct drm_gem_object *obj) +{ + struct mtk_gem_obj *mtk_gem = to_mtk_gem_obj(obj); + struct mtk_drm_private *priv = obj->dev->dev_private; + struct sg_table *sgt; + int ret; + + sgt = kzalloc(sizeof(*sgt), GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + ret = dma_get_sgtable_attrs(priv->dma_dev, sgt, mtk_gem->cookie, + mtk_gem->dma_addr, obj->size, + mtk_gem->dma_attrs); + if (ret) { + DRM_ERROR("failed to allocate sgt, %d\n", ret); + kfree(sgt); + return ERR_PTR(ret); + } + + return sgt; +} + +struct drm_gem_object *mtk_gem_prime_import_sg_table(struct drm_device *dev, + struct dma_buf_attachment *attach, struct sg_table *sg) +{ + struct mtk_gem_obj *mtk_gem; + + /* check if the entries in the sg_table are contiguous */ + if (drm_prime_get_contiguous_size(sg) < attach->dmabuf->size) { + DRM_ERROR("sg_table is not contiguous"); + return ERR_PTR(-EINVAL); + } + + mtk_gem = mtk_gem_init(dev, attach->dmabuf->size); + if (IS_ERR(mtk_gem)) + return ERR_CAST(mtk_gem); + + mtk_gem->dma_addr = sg_dma_address(sg->sgl); + mtk_gem->sg = sg; + + return &mtk_gem->base; +} + +int mtk_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map) +{ + struct mtk_gem_obj *mtk_gem = to_mtk_gem_obj(obj); + struct sg_table *sgt = NULL; + unsigned int npages; + + if (mtk_gem->kvaddr) + goto out; + + sgt = mtk_gem_prime_get_sg_table(obj); + if (IS_ERR(sgt)) + return PTR_ERR(sgt); + + npages = obj->size >> PAGE_SHIFT; + mtk_gem->pages = kcalloc(npages, sizeof(*mtk_gem->pages), GFP_KERNEL); + if (!mtk_gem->pages) { + sg_free_table(sgt); + kfree(sgt); + return -ENOMEM; + } + + drm_prime_sg_to_page_array(sgt, mtk_gem->pages, npages); + + mtk_gem->kvaddr = vmap(mtk_gem->pages, npages, VM_MAP, + pgprot_writecombine(PAGE_KERNEL)); + if (!mtk_gem->kvaddr) { + sg_free_table(sgt); + kfree(sgt); + kfree(mtk_gem->pages); + return -ENOMEM; + } + sg_free_table(sgt); + kfree(sgt); + +out: + iosys_map_set_vaddr(map, mtk_gem->kvaddr); + + return 0; +} + +void mtk_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map) +{ + struct mtk_gem_obj *mtk_gem = to_mtk_gem_obj(obj); + void *vaddr = map->vaddr; + + if (!mtk_gem->pages) + return; + + vunmap(vaddr); + mtk_gem->kvaddr = NULL; + kfree(mtk_gem->pages); +} diff --cc include/linux/fortify-string.h index 85fc0e6f0f7f5,b24d62bad0b32..d658ae729a02a --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@@ -738,10 -734,10 +738,11 @@@ __FORTIFY_INLINE void *kmemdup_noprof(c if (__compiletime_lessthan(p_size, size)) __read_overflow(); if (p_size < size) - fortify_panic(FORTIFY_FUNC_kmemdup, FORTIFY_READ, p_size, size, NULL); + fortify_panic(FORTIFY_FUNC_kmemdup, FORTIFY_READ, p_size, size, + __real_kmemdup(p, 0, gfp)); return __real_kmemdup(p, size, gfp); } + #define kmemdup(...) alloc_hooks(kmemdup_noprof(__VA_ARGS__)) /** * strcpy - Copy a string into another string buffer diff --cc include/linux/slab.h index ebc20173cd4e2,4cc37ef22aaed..7247e217e21bd --- a/include/linux/slab.h +++ b/include/linux/slab.h @@@ -744,68 -773,42 +773,47 @@@ static inline __alloc_size(1, 2) void * * @size: how many bytes of memory are required. * @flags: the type of memory to allocate (see kmalloc). */ - static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags) + static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags) { - return kmalloc(size, flags | __GFP_ZERO); + return kmalloc_noprof(size, flags | __GFP_ZERO); } + #define kzalloc(...) alloc_hooks(kzalloc_noprof(__VA_ARGS__)) + #define kzalloc_node(_size, _flags, _node) kmalloc_node(_size, (_flags)|__GFP_ZERO, _node) - /** - * kzalloc_node - allocate zeroed memory from a particular memory node. - * @size: how many bytes of memory are required. - * @flags: the type of memory to allocate (see kmalloc). - * @node: memory node from which to allocate - */ - static inline __alloc_size(1) void *kzalloc_node(size_t size, gfp_t flags, int node) - { - return kmalloc_node(size, flags | __GFP_ZERO, node); - } + extern void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node) __alloc_size(1); + #define kvmalloc_node(...) alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__)) - extern void *kvmalloc_node(size_t size, gfp_t flags, int node) __alloc_size(1); - static inline __alloc_size(1) void *kvmalloc(size_t size, gfp_t flags) - { - return kvmalloc_node(size, flags, NUMA_NO_NODE); - } - static inline __alloc_size(1) void *kvzalloc_node(size_t size, gfp_t flags, int node) - { - return kvmalloc_node(size, flags | __GFP_ZERO, node); - } - static inline __alloc_size(1) void *kvzalloc(size_t size, gfp_t flags) - { - return kvmalloc(size, flags | __GFP_ZERO); - } + #define kvmalloc(_size, _flags) kvmalloc_node(_size, _flags, NUMA_NO_NODE) + #define kvmalloc_noprof(_size, _flags) kvmalloc_node_noprof(_size, _flags, NUMA_NO_NODE) -#define kvzalloc(_size, _flags) kvmalloc(_size, _flags|__GFP_ZERO) ++#define kvzalloc(_size, _flags) kvmalloc(_size, (_flags)|__GFP_ZERO) + -#define kvzalloc_node(_size, _flags, _node) kvmalloc_node(_size, _flags|__GFP_ZERO, _node) ++#define kvzalloc_node(_size, _flags, _node) kvmalloc_node(_size, (_flags)|__GFP_ZERO, _node) -static inline __alloc_size(1, 2) void *kvmalloc_array_noprof(size_t n, size_t size, gfp_t flags) +static inline __alloc_size(1, 2) void * - kvmalloc_array_node(size_t n, size_t size, gfp_t flags, int node) ++kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) { size_t bytes; if (unlikely(check_mul_overflow(n, size, &bytes))) return NULL; - return kvmalloc_node(bytes, flags, node); - return kvmalloc_node_noprof(bytes, flags, NUMA_NO_NODE); ++ return kvmalloc_node_noprof(bytes, flags, node); } - static inline __alloc_size(1, 2) void * - kvmalloc_array(size_t n, size_t size, gfp_t flags) - { - return kvmalloc_array_node(n, size, flags, NUMA_NO_NODE); - } - - static inline __alloc_size(1, 2) void * - kvcalloc_node(size_t n, size_t size, gfp_t flags, int node) - { - return kvmalloc_array_node(n, size, flags | __GFP_ZERO, node); - } ++#define kvmalloc_array_noprof(...) kvmalloc_array_node_noprof(__VA_ARGS__, NUMA_NO_NODE) ++#define kvcalloc_node_noprof(_n,_s,_f,_node) kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node) ++#define kvcalloc_noprof(...) kvcalloc_node_noprof(__VA_ARGS__, NUMA_NO_NODE) + - static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t flags) - { - return kvmalloc_array(n, size, flags | __GFP_ZERO); - } + #define kvmalloc_array(...) alloc_hooks(kvmalloc_array_noprof(__VA_ARGS__)) -#define kvcalloc(_n, _size, _flags) kvmalloc_array(_n, _size, _flags|__GFP_ZERO) -#define kvcalloc_noprof(_n, _size, _flags) kvmalloc_array_noprof(_n, _size, _flags|__GFP_ZERO) ++#define kvcalloc_node(...) alloc_hooks(kvcalloc_node_noprof(__VA_ARGS__)) ++#define kvcalloc(...) alloc_hooks(kvcalloc_noprof(__VA_ARGS__)) - extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags) + extern void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags) __realloc_size(3); + #define kvrealloc(...) alloc_hooks(kvrealloc_noprof(__VA_ARGS__)) + extern void kvfree(const void *addr); -DEFINE_FREE(kvfree, void *, if (_T) kvfree(_T)) +DEFINE_FREE(kvfree, void *, if (!IS_ERR_OR_NULL(_T)) kvfree(_T)) extern void kvfree_sensitive(const void *addr, size_t len); diff --cc include/linux/string.h index 10e5177bb49c0,793c27ad7c0d2..60168aa2af075 --- a/include/linux/string.h +++ b/include/linux/string.h @@@ -284,11 -282,12 +284,13 @@@ extern void kfree_const(const void *x) extern char *kstrdup(const char *s, gfp_t gfp) __malloc; extern const char *kstrdup_const(const char *s, gfp_t gfp); extern char *kstrndup(const char *s, size_t len, gfp_t gfp); - extern void *kmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2); + extern void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp) __realloc_size(2); + #define kmemdup(...) alloc_hooks(kmemdup_noprof(__VA_ARGS__)) + extern void *kvmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2); extern char *kmemdup_nul(const char *s, size_t len, gfp_t gfp); -extern void *kmemdup_array(const void *src, size_t element_size, size_t count, gfp_t gfp); +extern void *kmemdup_array(const void *src, size_t element_size, size_t count, gfp_t gfp) + __realloc_size(2, 3); /* lib/argv_split.c */ extern char **argv_split(gfp_t gfp, const char *str, int *argcp); diff --cc io_uring/memmap.c index 523d982af2b05,0000000000000..4785d6af5fee9 mode 100644,000000..100644 --- a/io_uring/memmap.c +++ b/io_uring/memmap.c @@@ -1,336 -1,0 +1,336 @@@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "memmap.h" +#include "kbuf.h" + +static void *io_mem_alloc_compound(struct page **pages, int nr_pages, + size_t size, gfp_t gfp) +{ + struct page *page; + int i, order; + + order = get_order(size); + if (order > MAX_PAGE_ORDER) + return ERR_PTR(-ENOMEM); + else if (order) + gfp |= __GFP_COMP; + + page = alloc_pages(gfp, order); + if (!page) + return ERR_PTR(-ENOMEM); + + for (i = 0; i < nr_pages; i++) + pages[i] = page + i; + + return page_address(page); +} + +static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size, + gfp_t gfp) +{ + void *ret; + int i; + + for (i = 0; i < nr_pages; i++) { + pages[i] = alloc_page(gfp); + if (!pages[i]) + goto err; + } + + ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (ret) + return ret; +err: + while (i--) + put_page(pages[i]); + return ERR_PTR(-ENOMEM); +} + +void *io_pages_map(struct page ***out_pages, unsigned short *npages, + size_t size) +{ + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN; + struct page **pages; + int nr_pages; + void *ret; + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + pages = kvmalloc_array(nr_pages, sizeof(struct page *), gfp); + if (!pages) + return ERR_PTR(-ENOMEM); + + ret = io_mem_alloc_compound(pages, nr_pages, size, gfp); + if (!IS_ERR(ret)) + goto done; + + ret = io_mem_alloc_single(pages, nr_pages, size, gfp); + if (!IS_ERR(ret)) { +done: + *out_pages = pages; + *npages = nr_pages; + return ret; + } + + kvfree(pages); + *out_pages = NULL; + *npages = 0; + return ret; +} + +void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, + bool put_pages) +{ + bool do_vunmap = false; + + if (!ptr) + return; + + if (put_pages && *npages) { + struct page **to_free = *pages; + int i; + + /* + * Only did vmap for the non-compound multiple page case. + * For the compound page, we just need to put the head. + */ + if (PageCompound(to_free[0])) + *npages = 1; + else if (*npages > 1) + do_vunmap = true; + for (i = 0; i < *npages; i++) + put_page(to_free[i]); + } + if (do_vunmap) + vunmap(ptr); + kvfree(*pages); + *pages = NULL; + *npages = 0; +} + +void io_pages_free(struct page ***pages, int npages) +{ + struct page **page_array = *pages; + + if (!page_array) + return; + + unpin_user_pages(page_array, npages); + kvfree(page_array); + *pages = NULL; +} + +struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages) +{ + unsigned long start, end, nr_pages; + struct page **pages; + int ret; + + end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = uaddr >> PAGE_SHIFT; + nr_pages = end - start; + if (WARN_ON_ONCE(!nr_pages)) + return ERR_PTR(-EINVAL); + + pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return ERR_PTR(-ENOMEM); + + ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, + pages); + /* success, mapped all pages */ + if (ret == nr_pages) { + *npages = nr_pages; + return pages; + } + + /* partial map, or didn't map anything */ + if (ret >= 0) { + /* if we did partial map, release any pages we did get */ + if (ret) + unpin_user_pages(pages, ret); + ret = -EFAULT; + } + kvfree(pages); + return ERR_PTR(ret); +} + +void *__io_uaddr_map(struct page ***pages, unsigned short *npages, + unsigned long uaddr, size_t size) +{ + struct page **page_array; + unsigned int nr_pages; + void *page_addr; + + *npages = 0; + + if (uaddr & (PAGE_SIZE - 1) || !size) + return ERR_PTR(-EINVAL); + + nr_pages = 0; + page_array = io_pin_pages(uaddr, size, &nr_pages); + if (IS_ERR(page_array)) + return page_array; + + page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL); + if (page_addr) { + *pages = page_array; + *npages = nr_pages; + return page_addr; + } + + io_pages_free(&page_array, nr_pages); + return ERR_PTR(-ENOMEM); +} + +static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff, + size_t sz) +{ + struct io_ring_ctx *ctx = file->private_data; + loff_t offset = pgoff << PAGE_SHIFT; + + switch ((pgoff << PAGE_SHIFT) & IORING_OFF_MMAP_MASK) { + case IORING_OFF_SQ_RING: + case IORING_OFF_CQ_RING: + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); + return ctx->rings; + case IORING_OFF_SQES: + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); + return ctx->sq_sqes; + case IORING_OFF_PBUF_RING: { + struct io_buffer_list *bl; + unsigned int bgid; + void *ptr; + + bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; + bl = io_pbuf_get_bl(ctx, bgid); + if (IS_ERR(bl)) + return bl; + ptr = bl->buf_ring; + io_put_bl(ctx, bl); + return ptr; + } + } + + return ERR_PTR(-EINVAL); +} + +int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, + struct page **pages, int npages) +{ + unsigned long nr_pages = npages; + + vm_flags_set(vma, VM_DONTEXPAND); + return vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); +} + +#ifdef CONFIG_MMU + +__cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct io_ring_ctx *ctx = file->private_data; + size_t sz = vma->vm_end - vma->vm_start; + long offset = vma->vm_pgoff << PAGE_SHIFT; + void *ptr; + + ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + + switch (offset & IORING_OFF_MMAP_MASK) { + case IORING_OFF_SQ_RING: + case IORING_OFF_CQ_RING: + return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, + ctx->n_ring_pages); + case IORING_OFF_SQES: + return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages, + ctx->n_sqe_pages); + case IORING_OFF_PBUF_RING: + return io_pbuf_mmap(file, vma); + } + + return -EINVAL; +} + +unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + void *ptr; + + /* + * Do not allow to map to user-provided address to avoid breaking the + * aliasing rules. Userspace is not able to guess the offset address of + * kernel kmalloc()ed memory area. + */ + if (addr) + return -EINVAL; + + ptr = io_uring_validate_mmap_request(filp, pgoff, len); + if (IS_ERR(ptr)) + return -ENOMEM; + + /* + * Some architectures have strong cache aliasing requirements. + * For such architectures we need a coherent mapping which aliases + * kernel memory *and* userspace memory. To achieve that: + * - use a NULL file pointer to reference physical memory, and + * - use the kernel virtual address of the shared io_uring context + * (instead of the userspace-provided address, which has to be 0UL + * anyway). + * - use the same pgoff which the get_unmapped_area() uses to + * calculate the page colouring. + * For architectures without such aliasing requirements, the + * architecture will return any suitable mapping because addr is 0. + */ + filp = NULL; + flags |= MAP_SHARED; + pgoff = 0; /* has been translated to ptr above */ +#ifdef SHM_COLOUR + addr = (uintptr_t) ptr; + pgoff = addr >> PAGE_SHIFT; +#else + addr = 0UL; +#endif - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); ++ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags); +} + +#else /* !CONFIG_MMU */ + +int io_uring_mmap(struct file *file, struct vm_area_struct *vma) +{ + return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL; +} + +unsigned int io_uring_nommu_mmap_capabilities(struct file *file) +{ + return NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_WRITE; +} + +unsigned long io_uring_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + void *ptr; + + ptr = io_uring_validate_mmap_request(file, pgoff, len); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + + return (unsigned long) ptr; +} + +#endif /* !CONFIG_MMU */ diff --cc kernel/module/main.c index 91e185607d4b0,2d25eebc549db..d18a94b973e10 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@@ -56,8 -56,8 +56,9 @@@ #include #include #include + #include #include +#include #include #include "internal.h" @@@ -1188,50 -1198,32 +1189,54 @@@ void __weak module_arch_freeing_init(st { } -static bool mod_mem_use_vmalloc(enum mod_mem_type type) +static int module_memory_alloc(struct module *mod, enum mod_mem_type type) { - return IS_ENABLED(CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC) && - mod_mem_type_is_core_data(type); -} + unsigned int size = PAGE_ALIGN(mod->mem[type].size); + enum execmem_type execmem_type; + void *ptr; -static void *module_memory_alloc(unsigned int size, enum mod_mem_type type) -{ - if (mod_mem_use_vmalloc(type)) - return vzalloc(size); - return module_alloc(size); + mod->mem[type].size = size; + + if (mod_mem_type_is_data(type)) + execmem_type = EXECMEM_MODULE_DATA; + else + execmem_type = EXECMEM_MODULE_TEXT; + + ptr = execmem_alloc(execmem_type, size); + if (!ptr) + return -ENOMEM; + + /* + * The pointer to these blocks of memory are stored on the module + * structure and we keep that around so long as the module is + * around. We only free that memory when we unload the module. + * Just mark them as not being a leak then. The .init* ELF + * sections *do* get freed after boot so we *could* treat them + * slightly differently with kmemleak_ignore() and only grey + * them out as they work as typical memory allocations which + * *do* eventually get freed, but let's just keep things simple + * and avoid *any* false positives. + */ + kmemleak_not_leak(ptr); + + memset(ptr, 0, size); + mod->mem[type].base = ptr; + + return 0; } - static void module_memory_free(struct module *mod, enum mod_mem_type type) -static void module_memory_free(void *ptr, enum mod_mem_type type, ++static void module_memory_free(struct module *mod, enum mod_mem_type type, + bool unload_codetags) { + void *ptr = mod->mem[type].base; + + if (!unload_codetags && mod_mem_type_is_core_data(type)) + return; + - if (mod_mem_use_vmalloc(type)) - vfree(ptr); - else - module_memfree(ptr); + execmem_free(ptr); } - static void free_mod_mem(struct module *mod) + static void free_mod_mem(struct module *mod, bool unload_codetags) { for_each_mod_mem_type(type) { struct module_memory *mod_mem = &mod->mem[type]; @@@ -1242,12 -1234,13 +1247,12 @@@ /* Free lock-classes; relies on the preceding sync_rcu(). */ lockdep_free_key_range(mod_mem->base, mod_mem->size); if (mod_mem->size) - module_memory_free(mod, type); - module_memory_free(mod_mem->base, type, - unload_codetags); ++ module_memory_free(mod, type, unload_codetags); } /* MOD_DATA hosts mod, so free it at last */ lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size); - module_memory_free(mod, MOD_DATA); - module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA, unload_codetags); ++ module_memory_free(mod, MOD_DATA, unload_codetags); } /* Free a module, remove from lists, etc. */ @@@ -2287,7 -2309,7 +2299,7 @@@ static int move_module(struct module *m return 0; out_enomem: for (t--; t >= 0; t--) - module_memory_free(mod, t); - module_memory_free(mod->mem[t].base, t, true); ++ module_memory_free(mod, t, true); return ret; } diff --cc tools/testing/selftests/lib.mk index 33c24ceddfb71,1dae4a02957f9..3023e0e2f58f6 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@@ -52,24 -44,19 +52,33 @@@ endi selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST)))) top_srcdir = $(selfdir)/../../.. +# msg: emit succinct information message describing current building step +# $1 - generic step name (e.g., CC, LINK, etc); +# $2 - optional "flavor" specifier; if provided, will be emitted as [flavor]; +# $3 - target (assumed to be file); only file name will be emitted; +# $4 - optional extra arg, emitted as-is, if provided. +ifeq ($(V),1) +Q = +msg = +else +Q = @ +msg = @printf ' %-8s%s %s%s\n' "$(1)" "$(if $(2), [$(2)])" "$(notdir $(3))" "$(if $(4), $(4))"; +MAKEFLAGS += --no-print-directory +endif + ifeq ($(KHDR_INCLUDES),) -KHDR_INCLUDES := -isystem $(top_srcdir)/usr/include +KHDR_INCLUDES := -D_GNU_SOURCE -isystem $(top_srcdir)/usr/include endif + # In order to use newer items that haven't yet been added to the user's system + # header files, add $(TOOLS_INCLUDES) to the compiler invocation in each + # each selftest. + # You may need to add files to that location, or to refresh an existing file. In + # order to do that, run "make headers" from $(top_srcdir), then copy the + # header file that you want from $(top_srcdir)/usr/include/... , to the matching + # subdir in $(TOOLS_INCLUDE). + TOOLS_INCLUDES := -isystem $(top_srcdir)/tools/include/uapi + # The following are built by lib.mk common compile rules. # TEST_CUSTOM_PROGS should be used by tests that require # custom build rule and prevent common build rule use.