mm/madvise: don't perform madvise VMA walk for MADV_POPULATE_(READ|WRITE)
authorDavid Hildenbrand <david@redhat.com>
Thu, 14 Mar 2024 16:13:00 +0000 (17:13 +0100)
committerAndrew Morton <akpm@linux-foundation.org>
Fri, 26 Apr 2024 03:55:43 +0000 (20:55 -0700)
commitfa9fcd8bb6e4901e5ed0c7972547403d4c8dfec8
tree2bd0ccfc77d9cd3d2994abedee668703025bebcc
parent91b71e78b8e478e1a9af36b15ff1d38810572866
mm/madvise: don't perform madvise VMA walk for MADV_POPULATE_(READ|WRITE)

We changed faultin_page_range() to no longer consume a VMA, because
faultin_page_range() might internally release the mm lock to lookup
the VMA again -- required to cleanly handle VM_FAULT_RETRY. But
independent of that, __get_user_pages() will always lookup the VMA
itself.

Now that we let __get_user_pages() just handle VMA checks in a way that
is suitable for MADV_POPULATE_(READ|WRITE), the VMA walk in madvise()
is just overhead. So let's just call madvise_populate()
on the full range instead.

There is one change in behavior: madvise_walk_vmas() would skip any VMA
holes, and if everything succeeded, it would return -ENOMEM after
processing all VMAs.

However, for MADV_POPULATE_(READ|WRITE) it's unlikely for the caller to
notice any difference: -ENOMEM might either indicate that there were VMA
holes or that populating page tables failed because there was not enough
memory. So it's unlikely that user space will notice the difference, and
that special handling likely only makes sense for some other madvise()
actions.

Further, we'd already fail with -ENOMEM early in the past if looking up the
VMA after dropping the MM lock failed because of concurrent VMA
modifications. So let's just keep it simple and avoid the madvise VMA
walk, and consistently fail early if we find a VMA hole.

Link: https://lkml.kernel.org/r/20240314161300.382526-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/madvise.c