Oleg Nesterov [Mon, 23 Oct 2023 15:33:43 +0000 (17:33 +0200)]
do_io_accounting: use __for_each_thread()
Rather than while_each_thread() which should be avoided when possible.
This makes the code more clear and allows the next change.
Link: https://lkml.kernel.org/r/20231023153343.GA4629@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jia Rui [Wed, 18 Oct 2023 19:18:11 +0000 (03:18 +0800)]
ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
The BUG_ON() at ocfs2_num_free_extents() handles the error that
l_tree_deepth of leaf extent block just read form disk is invalid. This
error is mostly caused by file system metadata corruption on the disk.
There is no need to call BUG_ON() to handle such errors. We can return
error code, since the caller can deal with errors from
ocfs2_num_free_extents(). Also, we should make the file system read-only
to avoid the damage from expanding.
Therefore, BUG_ON() is removed and ocfs2_error() is called instead.
Link: https://lkml.kernel.org/r/20231018191811.412458-1-jindui71@gmail.com
Signed-off-by: Jia Rui <jindui71@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yuanheng Zhang [Wed, 11 Oct 2023 16:32:16 +0000 (00:32 +0800)]
ocfs2: fix a typo in a comment
Fix spelling typo in comment.
Link: https://lkml.kernel.org/r/20231011163216.29446-1-yuanhengzhang1214@gmail.com
Signed-off-by: Yuanheng Zhang <yuanhengzhang1214@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hu Haowen [Fri, 13 Oct 2023 13:28:32 +0000 (21:28 +0800)]
scripts/show_delta: add __main__ judgement before main code
When doing Python programming it is a nice convention to insert the if
statement `if __name__ == "__main__":` before any main code that does
actual functionalities to ensure the code will be executed only as a
script rather than as an imported module. Hence attach the missing
judgement to show_delta.
Link: https://lkml.kernel.org/r/20231013132832.165768-1-2023002089@link.tyut.edu.cn
Signed-off-by: Hu Haowen <2023002089@link.tyut.edu.cn>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexey Dobriyan [Wed, 11 Oct 2023 16:55:00 +0000 (19:55 +0300)]
treewide: mark stuff as __ro_after_init
__read_mostly predates __ro_after_init. Many variables which are marked
__read_mostly should have been __ro_after_init from day 1.
Also, mark some stuff as "const" and "__init" while I'm at it.
[akpm@linux-foundation.org: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning]
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Artem Chernyshev [Mon, 9 Oct 2023 14:11:11 +0000 (17:11 +0300)]
fs: ocfs2: check status values
Test return values before overwriting.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lkml.kernel.org/r/20231009141111.149858-1-artem.chernyshev@red-soft.ru
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Swarup Laxman Kotiaklapudi [Mon, 9 Oct 2023 11:07:14 +0000 (14:07 +0300)]
proc: test /proc/${pid}/statm
My original comment lied, output can be "0 A A B 0 0 0\n"
(see comment in the code).
I don't quite understand why
get_mm_counter(mm, MM_FILEPAGES) + get_mm_counter(mm, MM_SHMEMPAGES)
can stay positive but get_mm_counter(mm, MM_ANONPAGES) is always 0 after
everything is unmapped but that's just me.
[adobriyan@gmail.com: more or less rewritten]
Link: https://lkml.kernel.org/r/0721ca69-7bb4-40aa-8d01-0c5f91e5f363@p183
Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Thu, 5 Oct 2023 11:39:54 +0000 (11:39 +0000)]
compiler.h: move __is_constexpr() to compiler.h
Prior to
f747e6667ebb2 __is_constexpr() was in its only user minmax.h.
That commit moved it to const.h - but that file just defines ULL(x) and
UL(x) so that constants can be defined for .S and .c files.
So apart from the word 'const' it wasn't really a good location. Instead
move the definition to compiler.h just before the similar
is_signed_type() and is_unsigned_type().
This may not be a good long-term home, but the three definitions belong
together.
Link: https://lkml.kernel.org/r/2a6680bbe2e84459816a113730426782@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Fri, 22 Sep 2023 17:52:20 +0000 (10:52 -0700)]
gcov: annotate struct gcov_iterator with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct gcov_iterator.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Link: https://lkml.kernel.org/r/20230922175220.work.327-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Thu, 5 Oct 2023 21:35:17 +0000 (14:35 -0700)]
get_maintainer: add --keywords-in-file option
There were some recent attempts [1] [2] to make the K: field less noisy
and its behavior more obvious. Ultimately, a shift in the default
behavior and an associated command line flag is the best choice.
Currently, K: will match keywords found in both patches and files.
Matching content from entire files is (while documented) not obvious
behavior and is usually not wanted by maintainers.
Now only patch content will be matched against unless --keywords-in-file
is also provided as an argument to get_maintainer.
Add the actual keyword matched to the role or rolestats as well.
For instance given the diff below that removes clang:
: diff --git a/drivers/hid/bpf/entrypoints/README b/drivers/hid/bpf/entrypoints/README
: index
147e0d41509f..
f88eb19e8ef2 100644
: --- a/drivers/hid/bpf/entrypoints/README
: +++ b/drivers/hid/bpf/entrypoints/README
: @@ -1,4 +1,4 @@
: WARNING:
: If you change "entrypoints.bpf.c" do "make -j" in this directory to rebuild "entrypoints.skel.h".
: -Make sure to have clang 10 installed.
: +Make sure to have 10 installed.
: See Documentation/bpf/bpf_devel_QA.rst
The new role/rolestats output includes ":Keyword:\b(?i:clang|llvm)\b"
$ git diff drivers/hid/bpf/entrypoints/README | .scripts/get_maintainer.pl
Jiri Kosina <jikos@kernel.org> (maintainer:HID CORE LAYER,commit_signer:1/1=100%)
Benjamin Tissoires <benjamin.tissoires@redhat.com> (maintainer:HID CORE LAYER,commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%)
Nathan Chancellor <nathan@kernel.org> (supporter:CLANG/LLVM BUILD SUPPORT:Keyword:\b(?i:clang|llvm)\b)
Nick Desaulniers <ndesaulniers@google.com> (supporter:CLANG/LLVM BUILD SUPPORT:Keyword:\b(?i:clang|llvm)\b)
Tom Rix <trix@redhat.com> (reviewer:CLANG/LLVM BUILD SUPPORT:Keyword:\b(?i:clang|llvm)\b)
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (commit_signer:1/1=100%)
linux-input@vger.kernel.org (open list:HID CORE LAYER)
linux-kernel@vger.kernel.org (open list)
llvm@lists.linux.dev (open list:CLANG/LLVM BUILD SUPPORT:Keyword:\b(?i:clang|llvm)\b)
Link: https://lore.kernel.org/r/20231004-get_maintainer_change_k-v1-1-ac7ced18306a@google.com
Link: https://lore.kernel.org/all/20230928-get_maintainer_add_d-v2-0-8acb3f394571@google.com
Link: https://lore.kernel.org/all/3dca40b677dd2fef979a5a581a2db91df2c21801.camel@perches.com
Original-patch-by: Justin Stitt <justinstitt@google.com>
Link: https://lkml.kernel.org/r/01fe46f0c58aa8baf92156ae2bdccfb2bf0cb48e.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Justin Stitt <justinstitt@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexey Dobriyan [Fri, 29 Sep 2023 16:31:41 +0000 (19:31 +0300)]
proc: save LOC by using while loop
Use while loop instead of infinite loop with "break;".
Also move some variable to the inner scope where they belong.
Link: https://lkml.kernel.org/r/82c8f8e7-8ded-46ca-8857-e60b991d6205@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexey Dobriyan [Fri, 29 Sep 2023 16:30:18 +0000 (19:30 +0300)]
proc: use initializer for clearing some buffers
Save LOC by using dark magic of initialisation instead of memset().
Those buffer aren't passed to userspace directly so padding is not
an issue.
Link: https://lkml.kernel.org/r/3821d3a2-6e10-4629-b0d5-9519d828ab72@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Mon, 18 Sep 2023 08:19:25 +0000 (08:19 +0000)]
minmax: relax check to allow comparison between unsigned arguments and signed constants
Allow (for example) min(unsigned_var, 20).
The opposite min(signed_var, 20u) is still errored.
Since a comparison between signed and unsigned never makes the unsigned
value negative it is only necessary to adjust the __types_ok() test.
Link: https://lkml.kernel.org/r/633b64e2f39e46bb8234809c5595b8c7@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Mon, 18 Sep 2023 08:18:40 +0000 (08:18 +0000)]
minmax: allow comparisons of 'int' against 'unsigned char/short'
Since 'unsigned char/short' get promoted to 'signed int' it is safe to
compare them against an 'int' value.
Link: https://lkml.kernel.org/r/8732ef5f809c47c28a7be47c938b28d4@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Mon, 18 Sep 2023 08:17:57 +0000 (08:17 +0000)]
minmax: fix indentation of __cmp_once() and __clamp_once()
Remove the extra indentation and align continuation markers.
Link: https://lkml.kernel.org/r/bed41317a05c498ea0209eafbcab45a5@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Mon, 18 Sep 2023 08:17:15 +0000 (08:17 +0000)]
minmax: allow min()/max()/clamp() if the arguments have the same signedness.
The type-check in min()/max() is there to stop unexpected results if a
negative value gets converted to a large unsigned value. However it also
rejects 'unsigned int' v 'unsigned long' compares which are common and
never problematc.
Replace the 'same type' check with a 'same signedness' check.
The new test isn't itself a compile time error, so use static_assert() to
report the error and give a meaningful error message.
Due to the way builtin_choose_expr() works detecting the error in the
'non-constant' side (where static_assert() can be used) also detects
errors when the arguments are constant.
Link: https://lkml.kernel.org/r/fe7e6c542e094bfca655abcd323c1c98@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Mon, 18 Sep 2023 08:16:30 +0000 (08:16 +0000)]
minmax: add umin(a, b) and umax(a, b)
Patch series "minmax: Relax type checks in min() and max()", v4.
The min() (etc) functions in minmax.h require that the arguments have
exactly the same types.
However when the type check fails, rather than look at the types and fix
the type of a variable/constant, everyone seems to jump on min_t(). In
reality min_t() ought to be rare - when something unusual is being done,
not normality.
The orginal min() (added in 2.4.9) replaced several inline functions and
included the type - so matched the implicit casting of the function call.
This was renamed min_t() in 2.4.10 and the current min() added. There is
no actual indication that the conversion of negatve values to large
unsigned values has ever been an actual problem.
A quick grep shows 5734 min() and 4597 min_t(). Having the casts on
almost half of the calls shows that something is clearly wrong.
If the wrong type is picked (and it is far too easy to pick the type of
the result instead of the larger input) then significant bits can get
discarded.
Pretty much the worst example is in the derived clamp_val(), consider:
unsigned char x = 200u;
y = clamp_val(x, 10u, 300u);
I also suspect that many of the min_t(u16, ...) are actually wrong. For
example copy_data() in printk_ringbuffer.c contains:
data_size = min_t(u16, buf_size, len);
Here buf_size is 'unsigned int' and len 'u16', pass a 64k buffer (can you
prove that doesn't happen?) and no data is returned. Apparantly it did -
and has since been fixed.
The only reason that most of the min_t() are 'fine' is that pretty much
all the values in the kernel are between 0 and INT_MAX.
Patch 1 adds umin(), this uses integer promotions to convert both
arguments to 'unsigned long long'. It can be used to compare a signed
type that is known to contain a non-negative value with an unsigned type.
The compiler typically optimises it all away. Added first so that it can
be referred to in patch 2.
Patch 2 replaces the 'same type' check with a 'same signedness' one. This
makes min(unsigned_int_var, sizeof()) be ok. The error message is also
improved and will contain the expanded form of both arguments (useful for
seeing how constants are defined).
Patch 3 just fixes some whitespace.
Patch 4 allows comparisons of 'unsigned char' and 'unsigned short' to
signed types. The integer promotion rules convert them both to 'signed
int' prior to the comparison so they can never cause a negative value be
converted to a large positive one.
Patch 5 (rewritted for v4) allows comparisons of unsigned values against
non-negative constant integer expressions. This makes
min(unsigned_int_var, 4) be ok.
The only common case that is still errored is the comparison of signed
values against unsigned constant integer expressions below __INT_MAX__.
Typcally min(int_val, sizeof (foo)), the real fix for this is casting the
constant: min(int_var, (int)sizeof (foo)).
With all the patches applied pretty much all the min_t() could be replaced
by min(), and most of the rest by umin(). However they all need careful
inspection due to code like:
sz = min_t(unsigned char, sz - 1, LIM - 1) + 1;
which converts 0 to LIM.
This patch (of 6):
umin() and umax() can be used when min()/max() errors a signed v unsigned
compare when the signed value is known to be non-negative.
Unlike min_t(some_unsigned_type, a, b) umin() will never mask off high
bits if an inappropriate type is selected.
The '+ 0u + 0ul + 0ull' may look strange.
The '+ 0u' is needed for 'signed int' on 64bit systems.
The '+ 0ul' is needed for 'signed long' on 32bit systems.
The '+ 0ull' is needed for 'signed long long'.
Link: https://lkml.kernel.org/r/b97faef60ad24922b530241c5d7c933c@AcuMS.aculab.com
Link: https://lkml.kernel.org/r/41d93ca827a248698ec64bf57e0c05a5@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Li kunyu [Tue, 26 Sep 2023 02:24:10 +0000 (10:24 +0800)]
kernel/signal: remove unnecessary NULL values from ucounts
ucounts is assigned first, so it does not need to initialize the
assignment.
Link: https://lkml.kernel.org/r/20230926022410.4280-1-kunyu@nfschina.com
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Fri, 22 Sep 2023 17:49:30 +0000 (10:49 -0700)]
ocfs2: annotate struct ocfs2_replay_map with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ocfs2_replay_map.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Link: https://lkml.kernel.org/r/20230922174925.work.293-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Christophe JAILLET [Sat, 16 Sep 2023 15:50:11 +0000 (17:50 +0200)]
kstrtox: remove strtobool()
The conversion from strtobool() to kstrtobool() is completed. So
strtobool() can now be removed.
Link: https://lkml.kernel.org/r/87e3cc2547df174cd5af1fadbf866be4ef9e8e45.1694878151.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexey Dobriyan [Sat, 16 Sep 2023 18:21:31 +0000 (21:21 +0300)]
extract and use FILE_LINE macro
Extract nifty FILE_LINE useful for printk style debugging:
printk("%s\n", FILE_LINE);
It should not be used en mass probably because __FILE__ string literals
can be merged while FILE_LINE's won't. But for debugging it is what
the doctor ordered.
Don't add leading and trailing underscores, they're painful to type.
Trust me, I've tried both versions.
Link: https://lkml.kernel.org/r/ebf12ac4-5a61-4b12-b8b0-1253eb371332@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:42 +0000 (11:31 +0800)]
crash_core.c: remove unneeded functions
So far, nobody calls functions parse_crashkernel_high() and
parse_crashkernel_low(), remove both of them.
Link: https://lkml.kernel.org/r/20230914033142.676708-10-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:41 +0000 (11:31 +0800)]
riscv: kdump: use generic interface to simplify crashkernel reservation
With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:
1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN,
CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>;
2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
reserve_crashkernel_generic();
3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
arch/riscv/Kconfig.
The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.
[chenjiahao16@huawei.com: fix crashkernel reserving problem on RISC-V]
Link: https://lkml.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com
Link: https://lkml.kernel.org/r/20230914033142.676708-9-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:40 +0000 (11:31 +0800)]
arm64: kdump: use generic interface to simplify crashkernel reservation
With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:
1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN,
CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>;
2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
reserve_crashkernel_generic();
3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
arch/arm64/Kconfig.
The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.
Link: https://lkml.kernel.org/r/20230914033142.676708-8-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:39 +0000 (11:31 +0800)]
x86: kdump: use generic interface to simplify crashkernel reservation code
With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:
1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN,
CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>;
2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
reserve_crashkernel_generic(), and do the ARCH specific work if
needed.
3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
arch/x86/Kconfig.
When adding DEFAULT_CRASH_KERNEL_LOW_SIZE, add crash_low_size_default() to
calculate crashkernel low memory because x86_64 has special requirement.
The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.
[bhe@redhat.com: move crash_low_size_default() code into <asm/crash_core.h>]
Link: https://lkml.kernel.org/r/ZQpeAjOmuMJBFw1/@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/20230914033142.676708-7-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:38 +0000 (11:31 +0800)]
crash_core: move crashk_*res definition into crash_core.c
Both crashk_res and crashk_low_res are used to mark the reserved
crashkernel regions in iomem_resource tree. And later the generic
crashkernel resrvation will be added into crash_core.c. So move
crashk_res and crashk_low_res definition into crash_core.c to avoid
compiling error if CONFIG_CRASH_CORE=on while CONFIG_KEXEC_CORE is unset.
Meanwhile include <asm/crash_core.h> in <linux/crash_core.h> if generic
reservation is needed. In that case, <asm/crash_core.h> need be added by
ARCH. In asm/crash_core.h, ARCH can provide its own macro definitions to
override macros in <linux/crash_core.h> if needed. Wrap the including
into CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope to
avoid compiling error in other ARCH-es which don't take the generic
reservation way yet.
Link: https://lkml.kernel.org/r/20230914033142.676708-6-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:37 +0000 (11:31 +0800)]
crash_core: add generic function to do reservation
In architecture like x86_64, arm64 and riscv, they have vast virtual
address space and usually have huge physical memory RAM. Their
crashkernel reservation doesn't have to be limited under 4G RAM, but can
be extended to the whole physical memory via crashkernel=,high support.
Now add function reserve_crashkernel_generic() to reserve crashkernel
memory if users specify any case of kernel pamameters, like
crashkernel=xM[@offset] or crashkernel=,high|low.
This is preparation to simplify code of crashkernel=,high support in
architecutures.
Link: https://lkml.kernel.org/r/20230914033142.676708-5-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:36 +0000 (11:31 +0800)]
crash_core: change parse_crashkernel() to support crashkernel=,high|low parsing
Now parse_crashkernel() is a real entry point for all kinds of crahskernel
parsing on any architecture.
And wrap the crahskernel=,high|low handling inside
CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope.
Link: https://lkml.kernel.org/r/20230914033142.676708-4-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:35 +0000 (11:31 +0800)]
crash_core: change the prototype of function parse_crashkernel()
Add two parameters 'low_size' and 'high' to function parse_crashkernel(),
later crashkernel=,high|low parsing will be added. Make adjustments in
all call sites of parse_crashkernel() in arch.
Link: https://lkml.kernel.org/r/20230914033142.676708-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Thu, 14 Sep 2023 03:31:34 +0000 (11:31 +0800)]
crash_core.c: remove unnecessary parameter of function
Patch series "kdump: use generic functions to simplify crashkernel
reservation in arch", v3.
In the current arm64, crashkernel=,high support has been finished after
several rounds of posting and careful reviewing. The code in arm64 which
parses crashkernel kernel parameters firstly, then reserve memory can be a
good example for other ARCH to refer to.
Whereas in x86_64, the code mixing crashkernel parameter parsing and
memory reserving is twisted, and looks messy. Refactoring the code to
make it more readable maintainable is necessary.
Here, firstly abstract the crashkernel parameter parsing code into
parse_crashkernel() to make it be able to parse crashkernel=,high|low.
Then abstract the crashkernel memory reserving code into a generic
function reserve_crashkernel_generic(). Finally, in ARCH which
crashkernel=,high support is needed, a simple arch_reserve_crashkernel()
can be added to call above two functions. This can remove the duplicated
implmentation code in each ARCH, like arm64, x86_64 and riscv.
crashkernel=512M,high
crashkernel=512M,high crashkernel=256M,low
crashkernel=512M,high crashkernel=0M,low
crashkernel=0M,high crashkernel=256M,low
crashkernel=512M
crashkernel=512M@0x4f000000
crashkernel=1G-4G:256M,4G-64G:320M,64G-:576M
crashkernel=0M
This patch (of 9):
In all call sites of __parse_crashkernel(), the parameter 'name' is
hardcoded as "crashkernel=". So remove the unnecessary parameter 'name',
add local varibale 'name' inside __parse_crashkernel() instead.
Link: https://lkml.kernel.org/r/20230914033142.676708-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20230914033142.676708-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Azeem Shaikh [Thu, 31 Aug 2023 19:38:27 +0000 (19:38 +0000)]
fs: ocfs2: replace strlcpy with sysfs_emit
strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1]. In an effort
to remove strlcpy() completely [2], replace strlcpy() here with
sysfs_emit().
Direct replacement is safe here since its ok for `kernel_param_ops.get()`
to return -errno [3].
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
[3] https://elixir.bootlin.com/linux/v6.5/source/include/linux/moduleparam.h#L52
Link: https://lkml.kernel.org/r/20230831193827.1528867-1-azeemshaikh38@gmail.com
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Tue, 12 Sep 2023 09:23:55 +0000 (12:23 +0300)]
minmax: fix header inclusions
BUILD_BUG_ON*() macros are defined in build_bug.h. Include it. Replace
compiler_types.h by compiler.h, which provides the former, to have a
definition of the __UNIQUE_ID().
Link: https://lkml.kernel.org/r/20230912092355.79280-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Herve Codina <herve.codina@bootlin.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rong Tao [Mon, 11 Sep 2023 14:55:09 +0000 (22:55 +0800)]
pid: pid_ns_ctl_handler: remove useless comment
commit
95846ecf9dac("pid: replace pid bitmap implementation with IDR API")
removes 'last_pid' element, and use the idr_get_cursor-idr_set_cursor pair
to set the value of idr, so useless comments should be removed.
Link: https://lkml.kernel.org/r/tencent_157A2A1CAF19A3F5885F0687426159A19708@qq.com
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Mon, 11 Sep 2023 15:49:13 +0000 (18:49 +0300)]
minmax: deduplicate __unconst_integer_typeof()
It appears that compiler_types.h already have an implementation of the
__unconst_integer_typeof() called __unqual_scalar_typeof(). Use it
instead of the copy.
Link: https://lkml.kernel.org/r/20230911154913.4176033-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andreas Gruenbacher [Thu, 7 Sep 2023 23:40:48 +0000 (01:40 +0200)]
kthread: add kthread_stop_put
Add a kthread_stop_put() helper that stops a thread and puts its task
struct. Use it to replace the various instances of kthread_stop()
followed by put_task_struct().
Remove the kthread_stop_put() macro in usbip that is similar but doesn't
return the result of kthread_stop().
[agruenba@redhat.com: fix kerneldoc comment]
Link: https://lkml.kernel.org/r/20230911111730.2565537-1-agruenba@redhat.com
[akpm@linux-foundation.org: document kthread_stop_put()'s argument]
Link: https://lkml.kernel.org/r/20230907234048.2499820-1-agruenba@redhat.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Sat, 9 Sep 2023 21:49:51 +0000 (23:49 +0200)]
taskstats: fill_stats_for_tgid: use for_each_thread()
do/while_each_thread should be avoided when possible.
Plus I _think_ this change allows to avoid lock_task_sighand() but I am
not sure, I forgot everything about taskstats. In any case, this code
does not look right in that the same thread can be accounted twice:
taskstats_exit() can account the exiting thread in signal->stats and drop
->siglock but this thread is still on the thread-group list, so
lock_task_sighand() can't help.
Link: https://lkml.kernel.org/r/20230909214951.GA24274@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Sat, 9 Sep 2023 17:26:29 +0000 (19:26 +0200)]
getrusage: use __for_each_thread()
do/while_each_thread should be avoided when possible.
Plus this change allows to avoid lock_task_sighand(), we can use rcu
and/or sig->stats_lock instead.
Link: https://lkml.kernel.org/r/20230909172629.GA20454@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Sat, 9 Sep 2023 17:25:54 +0000 (19:25 +0200)]
getrusage: add the "signal_struct *sig" local variable
No functional changes, cleanup/preparation.
Link: https://lkml.kernel.org/r/20230909172554.GA20441@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Sat, 9 Sep 2023 16:45:37 +0000 (18:45 +0200)]
signal: complete_signal: use __for_each_thread()
do/while_each_thread should be avoided when possible.
Link: https://lkml.kernel.org/r/20230909164537.GA11633@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Sat, 9 Sep 2023 16:45:01 +0000 (18:45 +0200)]
fs/proc: do_task_stat: use __for_each_thread()
do/while_each_thread should be avoided when possible.
Link: https://lkml.kernel.org/r/20230909164501.GA11581@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xingui Yang [Tue, 5 Sep 2023 02:48:35 +0000 (02:48 +0000)]
scsi: qla2xxx: use DEFINE_SHOW_STORE_ATTRIBUTE() helper for debugfs
Use DEFINE_SHOW_STORE_ATTRIBUTE() helper for read-write file to reduce some
duplicated code.
Link: https://lkml.kernel.org/r/20230905024835.43219-4-yangxingui@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Co-developed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xingui Yang [Tue, 5 Sep 2023 02:48:34 +0000 (02:48 +0000)]
scsi: hisi_sas: use DEFINE_SHOW_STORE_ATTRIBUTE() helper for debugfs
Use DEFINE_SHOW_STORE_ATTRIBUTE() helper for read-write file to reduce some
duplicated code.
Link: https://lkml.kernel.org/r/20230905024835.43219-3-yangxingui@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Co-developed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xingui Yang [Tue, 5 Sep 2023 02:48:33 +0000 (02:48 +0000)]
seq_file: add helper macro to define attribute for rw file
Patch series "Add helper macro DEFINE_SHOW_STORE_ATTRIBUTE() at
seq_file.c", v6.
We already own DEFINE_SHOW_ATTRIBUTE() helper macro for defining attribute
for read-only file, but we found many of drivers also want a helper macro
for read-write file too.
So we add this helper macro to reduce duplicated code.
This patch (of 3):
We already own DEFINE_SHOW_ATTRIBUTE() helper macro for defining attribute
for read-only file, but many of drivers want a helper macro for read-write
file too.
So we add DEFINE_SHOW_STORE_ATTRIBUTE() helper to reduce duplicated code.
Link: https://lkml.kernel.org/r/20230905024835.43219-1-yangxingui@huawei.com
Link: https://lkml.kernel.org/r/20230905024835.43219-2-yangxingui@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Co-developed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uros Bizjak [Mon, 4 Sep 2023 15:21:01 +0000 (17:21 +0200)]
panic: use atomic_try_cmpxchg in panic() and nmi_panic()
Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
in panic() and nmi_panic(). x86 CMPXCHG instruction returns success in ZF
flag, so this change saves a compare after cmpxchg (and related move
instruction in front of cmpxchg).
Also, rename cpu variable to this_cpu in nmi_panic() and try to unify
logic flow between panic() and nmi_panic().
No functional change intended.
[ubizjak@gmail.com: clean up if/else block]
Link: https://lkml.kernel.org/r/20230906191200.68707-1-ubizjak@gmail.com
Link: https://lkml.kernel.org/r/20230904152230.9227-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Wed, 23 Aug 2023 17:14:55 +0000 (19:14 +0200)]
__kill_pgrp_info: simplify the calculation of return value
No need to calculate/check the "success" variable, we can kill it and update
retval in the main loop unless it is zero.
Link: https://lkml.kernel.org/r/20230823171455.GA12188@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Sat, 26 Aug 2023 11:14:09 +0000 (13:14 +0200)]
kill task_struct->thread_group
The last user was removed by the previous patch.
Link: https://lkml.kernel.org/r/20230826111409.GA23243@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Sat, 26 Aug 2023 11:14:06 +0000 (13:14 +0200)]
change thread_group_empty() to use task_struct->thread_node
Patch series "kill task_struct->thread_group".
This patch (of 2):
It could use list_is_singular() but this way it is cheaper. Plus the
thread_group_leader() check makes it clear that thread_group_empty() can
only return true if p is a group leader. This was not immediately obvious
before this patch.
task_struct->thread_group no longer has users, it can die.
Link: https://lkml.kernel.org/r/20230826111200.GA22982@redhat.com
Link: https://lkml.kernel.org/r/20230826111406.GA23238@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 24 Aug 2023 14:32:01 +0000 (16:32 +0200)]
change next_thread() to use __next_thread() ?: group_leader
This relies on fact that group leader is always the 1st entry in the
signal->thread_head list.
With or without this change, if the lockless next_thread(last_thread)
races with exec it can return the old or the new leader.
We are almost ready to kill task->thread_group, after this change its
only user is thread_group_empty().
Link: https://lkml.kernel.org/r/20230824143201.GB31222@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 24 Aug 2023 14:31:42 +0000 (16:31 +0200)]
introduce __next_thread(), fix next_tid() vs exec() race
Patch series "introduce __next_thread(), change next_thread()".
After commit
dce8f8ed1de1 ("document while_each_thread(), change
first_tid() to use for_each_thread()") + this series
1. We have only one lockless user of next_thread(), task_group_seq_get_next().
I think it should be changed too.
2. We have only one user of task_struct->thread_group, thread_group_empty().
The next patches will change thread_group_empty() and kill ->thread_group.
This patch (of 2):
next_tid(start) does:
rcu_read_lock();
if (pid_alive(start)) {
pos = next_thread(start);
if (thread_group_leader(pos))
pos = NULL;
else
get_task_struct(pos);
it should return pos = NULL when next_thread() wraps to the 1st thread
in the thread group, group leader, and the thread_group_leader() check
tries to detect this case.
But this can race with exec. To simplify, suppose we have a main thread
M and a single sub-thread T, next_tid(T) should return NULL.
Now suppose that T execs. If next_tid(T) is called after T changes the
leadership and before it does release_task() which removes the old leader
from list, then next_thread() returns M and thread_group_leader(M) = F.
Lockless use of next_thread() should be avoided. After this change only
task_group_seq_get_next() does this, and I believe it should be changed
as well.
Link: https://lkml.kernel.org/r/20230824143112.GA31208@redhat.com
Link: https://lkml.kernel.org/r/20230824143142.GA31222@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yuanheng Zhang [Mon, 28 Aug 2023 05:17:41 +0000 (13:17 +0800)]
ocfs2: correct range->len in ocfs2_trim_fs()
global bitmap is a cluster allocator,so after we traverse the global
bitmap and finished the fstrim,the trimmed range should be 'trimmed *
clustersize'.otherwise,the trimmed range printed by 'fstrim -v' is not as
expected.
Link: https://lkml.kernel.org/r/20230828051741.204577-1-yuanhengzhang1214@gmail.com
Signed-off-by: Yuanheng Zhang <yuanhengzhang1214@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nick Desaulniers [Thu, 31 Aug 2023 16:33:40 +0000 (09:33 -0700)]
compiler.h: unify __UNIQUE_ID
commit
6f33d58794ef ("__UNIQUE_ID()")
added a fallback definition of __UNIQUE_ID because gcc 4.2 and older did
not support __COUNTER__.
Also, this commit is effectively a revert of
commit
b41c29b0527c ("Kbuild: provide a __UNIQUE_ID for clang")
which mentions clang 2.6+ supporting __COUNTER__.
Documentation/process/changes.rst currently lists the minimum supported
version of these compilers as:
- gcc: 5.1
- clang: 11.0.0
It should be safe to say that __COUNTER__ is well supported by this
point.
Link: https://lkml.kernel.org/r/20230831-unique_id-v1-1-28bacd18eb1d@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Michal rarek <mmarek@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Paul Russel <rusty@rustcorp.com.au>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Costa Shulyupin [Fri, 25 Aug 2023 01:30:57 +0000 (04:30 +0300)]
docs: fix link s390/zfcpdump.rst
After move of Documentation/s390 to Documentation/arch/s390
Link: https://lkml.kernel.org/r/20230825013102.1487979-1-costa.shul@redhat.com
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Eric DeVolder <eric.devolder@oracle.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 1 Oct 2023 21:15:13 +0000 (14:15 -0700)]
Linux 6.6-rc4
Linus Torvalds [Sun, 1 Oct 2023 20:48:46 +0000 (13:48 -0700)]
Merge tag 'kbuild-fixes-v6.6-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix the module compression with xz so the in-kernel decompressor
works
- Document a kconfig idiom to express an optional dependency between
modules
- Make modpost, when W=1 is given, detect broken drivers that reference
.exit.* sections
- Remove unused code
* tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: remove stale code for 'source' symlink in packaging scripts
modpost: Don't let "driver"s reference .exit.*
vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
modpost: add missing else to the "of" check
Documentation: kbuild: explain handling optional dependencies
kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
Linus Torvalds [Sun, 1 Oct 2023 20:33:25 +0000 (13:33 -0700)]
Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
pertain to issues which were introduced after 6.5"
* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Crash: add lock to serialize crash hotplug handling
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
mm, memcg: reconsider kmem.limit_in_bytes deprecation
mm: zswap: fix potential memory corruption on duplicate store
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
mm: hugetlb: add huge page size param to set_huge_pte_at()
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
maple_tree: add mas_is_active() to detect in-tree walks
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
mm: abstract moving to the next PFN
mm: report success more often from filemap_map_folio_range()
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
Linus Torvalds [Sun, 1 Oct 2023 19:50:04 +0000 (12:50 -0700)]
Merge tag 'char-misc-6.6-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull misc driver fix from Greg KH:
"Here is a single, much requested, fix for a set of misc drivers to
resolve a much reported regression in the -rc series that has also
propagated back to the stable releases. Sorry for the delay, lots of
conference travel for a few weeks put me very far behind in patch
wrangling.
It has been reported by many to resolve the reported problem, and has
been in linux-next with no reported issues"
* tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe
Linus Torvalds [Sun, 1 Oct 2023 19:44:45 +0000 (12:44 -0700)]
Merge tag 'tty-6.6-rc4' of git://git./linux/kernel/git/gregkh/tty
Pull tty / serial driver fixes from Greg KH:
"Here are two tty/serial driver fixes for 6.6-rc4 that resolve some
reported regressions:
- revert a n_gsm change that ended up causing problems
- 8250_port fix for irq data
both have been in linux-next for over a week with no reported
problems"
* tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux"
serial: 8250_port: Check IRQ data before use
Linus Torvalds [Sun, 1 Oct 2023 16:50:58 +0000 (09:50 -0700)]
Merge tag 'x86-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a kerneldoc build warning fix, add SRSO mitigation for
AMD-derived Hygon processors, and fix a SGX kernel crash in the page
fault handler that can trigger when ksgxd races to reclaim the SECS
special page, by making the SECS page unswappable"
* tag 'x86-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race
x86/srso: Add SRSO mitigation for Hygon processors
x86/kgdb: Fix a kerneldoc warning when build with W=1
Linus Torvalds [Sun, 1 Oct 2023 16:41:58 +0000 (09:41 -0700)]
Merge tag 'timers-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix a spurious kernel warning during CPU hotplug events that may
trigger when timer/hrtimer softirqs are pending, which are otherwise
hotplug-safe and don't merit a warning"
* tag 'timers-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Tag (hr)timer softirq as hotplug safe
Linus Torvalds [Sun, 1 Oct 2023 16:38:05 +0000 (09:38 -0700)]
Merge tag 'sched-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a RT tasks related lockup/live-lock during CPU offlining"
* tag 'sched-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Fix live lock between select_fallback_rq() and RT push
Linus Torvalds [Sun, 1 Oct 2023 16:34:53 +0000 (09:34 -0700)]
Merge tag 'perf-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
"Misc fixes: work around an AMD microcode bug on certain models, and
fix kexec kernel PMI handlers on AMD systems that get loaded on older
kernels that have an unexpected register state"
* tag 'perf-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd: Do not WARN() on every IRQ
perf/x86/amd/core: Fix overflow reset on hotplug
Masahiro Yamada [Sun, 1 Oct 2023 14:03:39 +0000 (23:03 +0900)]
kbuild: remove stale code for 'source' symlink in packaging scripts
Since commit
d8131c2965d5 ("kbuild: remove $(MODLIB)/source symlink"),
modules_install does not create the 'source' symlink.
Remove the stale code from builddeb and kernel.spec.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Uwe Kleine-König [Sat, 30 Sep 2023 16:52:04 +0000 (18:52 +0200)]
modpost: Don't let "driver"s reference .exit.*
Drivers must not reference functions marked with __exit as these likely
are not available when the code is built-in.
There are few creative offenders uncovered for example in ARCH=amd64
allmodconfig builds. So only trigger the section mismatch warning for
W=1 builds.
The dual rule that drivers must not reference .init.* is implemented
since commit
0db252452378 ("modpost: don't allow *driver to reference
.init.*") which however missed that .exit.* should be handled in the
same way.
Thanks to Masahiro Yamada and Arnd Bergmann who gave valuable hints to
find this improvement.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Sat, 30 Sep 2023 07:13:35 +0000 (16:13 +0900)]
vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
Remove the left-over of commit
e24f6628811e ("modpost: remove all
traces of cpuinit/cpuexit sections").
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Mauricio Faria de Oliveira [Thu, 28 Sep 2023 20:28:07 +0000 (17:28 -0300)]
modpost: add missing else to the "of" check
Without this 'else' statement, an "usb" name goes into two handlers:
the first/previous 'if' statement _AND_ the for-loop over 'devtable',
but the latter is useless as it has no 'usb' device_id entry anyway.
Tested with allmodconfig before/after patch; no changes to *.mod.c:
git checkout v6.6-rc3
make -j$(nproc) allmodconfig
make -j$(nproc) olddefconfig
make -j$(nproc)
find . -name '*.mod.c' | cpio -pd /tmp/before
# apply patch
make -j$(nproc)
find . -name '*.mod.c' | cpio -pd /tmp/after
diff -r /tmp/before/ /tmp/after/
# no difference
Fixes: acbef7b76629 ("modpost: fix module autoloading for OF devices with generic compatible property")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Linus Torvalds [Sun, 1 Oct 2023 01:41:37 +0000 (18:41 -0700)]
Merge tag 'soc-fixes-6.6' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are the latest bug fixes that have come up in the soc tree. Most
of these are fairly minor. Most notably, the majority of changes this
time are not for dts files as usual.
- Updates to the addresses of the broadcom and aspeed entries in the
MAINTAINERS file.
- Defconfig updates to address a regression on samsung and a build
warning from an unknown Kconfig symbol
- Build fixes for the StrongARM and Uniphier platforms
- Code fixes for SCMI and FF-A firmware drivers, both of which had a
simple bug that resulted in invalid data, and a lesser fix for the
optee firmware driver
- Multiple fixes for the recently added loongson/loongarch "guts" soc
driver
- Devicetree fixes for RISC-V on the startfive platform, addressing
issues with NOR flash, usb and uart.
- Multiple fixes for NXP i.MX8/i.MX9 dts files, fixing problems with
clock, gpio, hdmi settings and the Makefile
- Bug fixes for i.MX firmware code and the OCOTP soc driver
- Multiple fixes for the TI sysc bus driver
- Minor dts updates for TI omap dts files, to address boot time
warnings and errors"
* tag 'soc-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
MAINTAINERS: Fix Florian Fainelli's email address
arm64: defconfig: enable syscon-poweroff driver
ARM: locomo: fix locomolcd_power declaration
soc: loongson: loongson2_guts: Remove unneeded semicolon
soc: loongson: loongson2_guts: Convert to devm_platform_ioremap_resource()
soc: loongson: loongson_pm2: Populate children syscon nodes
dt-bindings: soc: loongson,ls2k-pmc: Allow syscon-reboot/syscon-poweroff as child
soc: loongson: loongson_pm2: Drop useless of_device_id compatible
dt-bindings: soc: loongson,ls2k-pmc: Use fallbacks for ls2k-pmc compatible
soc: loongson: loongson_pm2: Add dependency for INPUT
arm64: defconfig: remove CONFIG_COMMON_CLK_NPCM8XX=y
ARM: uniphier: fix cache kernel-doc warnings
MAINTAINERS: aspeed: Update Andrew's email address
MAINTAINERS: aspeed: Update git tree URL
firmware: arm_ffa: Don't set the memory region attributes for MEM_LEND
arm64: dts: imx: Add imx8mm-prt8mm.dtb to build
arm64: dts: imx8mm-evk: Fix hdmi@3d node
soc: imx8m: Enable OCOTP clock for imx8mm before reading registers
arm64: dts: imx8mp-beacon-kit: Fix audio_pll2 clock
arm64: dts: imx8mp: Fix SDMA2/3 clocks
...
Linus Torvalds [Sun, 1 Oct 2023 01:19:02 +0000 (18:19 -0700)]
Merge tag 'trace-v6.6-rc3' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Make sure 32-bit applications using user events have aligned access
when running on a 64-bit kernel.
- Add cond_resched in the loop that handles converting enums in
print_fmt string is trace events.
- Fix premature wake ups of polling processes in the tracing ring
buffer. When a task polls waiting for a percentage of the ring buffer
to be filled, the writer still will wake it up at every event. Add
the polling's percentage to the "shortest_full" list to tell the
writer when to wake it up.
- For eventfs dir lookups on dynamic events, an event system's only
event could be removed, leaving its dentry with no children. This is
totally legitimate. But in eventfs_release() it must not access the
children array, as it is only allocated when the dentry has children.
* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Test for dentries array allocated in eventfs_release()
tracing/user_events: Align set_bit() address for all archs
tracing: relax trace_event_eval_update() execution with cond_resched()
ring-buffer: Update "shortest_full" in polling
Steven Rostedt (Google) [Sat, 30 Sep 2023 13:01:06 +0000 (09:01 -0400)]
eventfs: Test for dentries array allocated in eventfs_release()
The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.
Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ef36b4f92868 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Mon, 25 Sep 2023 23:08:28 +0000 (23:08 +0000)]
tracing/user_events: Align set_bit() address for all archs
All architectures should use a long aligned address passed to set_bit().
User processes can pass either a 32-bit or 64-bit sized value to be
updated when tracing is enabled when on a 64-bit kernel. Both cases are
ensured to be naturally aligned, however, that is not enough. The
address must be long aligned without affecting checks on the value
within the user process which require different adjustments for the bit
for little and big endian CPUs.
Add a compat flag to user_event_enabler that indicates when a 32-bit
value is being used on a 64-bit kernel. Long align addresses and correct
the bit to be used by set_bit() to account for this alignment. Ensure
compat flags are copied during forks and used during deletion clears.
Link: https://lore.kernel.org/linux-trace-kernel/20230925230829.341-2-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/20230914131102.179100-1-cleger@rivosinc.com/
Cc: stable@vger.kernel.org
Fixes: 7235759084a4 ("tracing/user_events: Use remote writes for event enablement")
Reported-by: Clément Léger <cleger@rivosinc.com>
Suggested-by: Clément Léger <cleger@rivosinc.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Clément Léger [Fri, 29 Sep 2023 19:16:37 +0000 (21:16 +0200)]
tracing: relax trace_event_eval_update() execution with cond_resched()
When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.
Adding a call to cond_resched() in trace_event_eval_update() allows
other tasks to be executed and thus continue working asynchronously
like before without blocking any pending task at boot time.
Link: https://lore.kernel.org/linux-trace-kernel/20230929191637.416931-1-cleger@rivosinc.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Fri, 29 Sep 2023 22:01:13 +0000 (18:01 -0400)]
ring-buffer: Update "shortest_full" in polling
It was discovered that the ring buffer polling was incorrectly stating
that read would not block, but that's because polling did not take into
account that reads will block if the "buffer-percent" was set. Instead,
the ring buffer polling would say reads would not block if there was any
data in the ring buffer. This was incorrect behavior from a user space
point of view. This was fixed by commit
42fb0a1e84ff by having the polling
code check if the ring buffer had more data than what the user specified
"buffer percent" had.
The problem now is that the polling code did not register itself to the
writer that it wanted to wait for a specific "full" value of the ring
buffer. The result was that the writer would wake the polling waiter
whenever there was a new event. The polling waiter would then wake up, see
that there's not enough data in the ring buffer to notify user space and
then go back to sleep. The next event would wake it up again.
Before the polling fix was added, the code would wake up around 100 times
for a hackbench 30 benchmark. After the "fix", due to the constant waking
of the writer, it would wake up over 11,0000 times! It would never leave
the kernel, so the user space behavior was still "correct", but this
definitely is not the desired effect.
To fix this, have the polling code add what it's waiting for to the
"shortest_full" variable, to tell the writer not to wake it up if the
buffer is not as full as it expects to be.
Note, after this fix, it appears that the waiter is now woken up around 2x
the times it was before (~200). This is a tremendous improvement from the
11,000 times, but I will need to spend some time to see why polling is
more aggressive in its wakeups than the read blocking code.
Link: https://lore.kernel.org/linux-trace-kernel/20230929180113.01c2cae3@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sat, 30 Sep 2023 18:07:26 +0000 (11:07 -0700)]
Merge tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- fix the narea calculation in swiotlb initialization (Ross Lagerwall)
- fix the check whether a device has used swiotlb (Petr Tesarik)
* tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix the check whether a device has used software IO TLB
swiotlb: use the calculated number of areas
Linus Torvalds [Sat, 30 Sep 2023 18:01:38 +0000 (11:01 -0700)]
Merge tag 'iomap-6.6-fixes-4' of git://git./fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
- Handle a race between writing and shrinking block devices by
returning EIO
- Fix a typo in a comment
* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Spelling s/preceeding/preceding/g
iomap: add a workaround for racy i_size updates on block devices
Linus Torvalds [Sat, 30 Sep 2023 17:07:33 +0000 (10:07 -0700)]
Merge tag 'i2c-for-6.6-rc4' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Usual business: a driver fix, a DT fix, a minor core fix"
* tag 'i2c-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: npcm7xx: Fix callback completion ordering
i2c: mux: Avoid potential false error message in i2c_mux_add_adapter
dt-bindings: i2c: mxs: Pass ref and 'unevaluatedProperties: false'
Linus Torvalds [Sat, 30 Sep 2023 16:59:37 +0000 (09:59 -0700)]
Merge tag 'acpi-6.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a possible NULL pointer dereference in the error path of
acpi_video_bus_add() resulting from recent changes (Dinghao Liu)"
* tag 'acpi-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Fix NULL pointer dereference in acpi_video_bus_add()
Linus Torvalds [Sat, 30 Sep 2023 16:53:09 +0000 (09:53 -0700)]
Merge tag 'powerpc-6.6-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix arch_stack_walk_reliable(), used by live patching
- Fix powerpc selftests to work with run_kselftest.sh
Thanks to Joe Lawrence and Petr Mladek.
* tag 'powerpc-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix emit_tests to work with run_kselftest.sh
powerpc/stacktrace: Fix arch_stack_walk_reliable()
Linus Torvalds [Sat, 30 Sep 2023 16:44:48 +0000 (09:44 -0700)]
Merge tag 'nfsd-6.6-2' of git://git./linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix NFSv4 READ corner case
* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
Linus Torvalds [Sat, 30 Sep 2023 16:39:23 +0000 (09:39 -0700)]
Merge tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
"Fix for password freeing potential oops (also for stable)"
* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
fs/smb/client: Reset password pointer to NULL
Baoquan He [Tue, 26 Sep 2023 12:09:05 +0000 (20:09 +0800)]
Crash: add lock to serialize crash hotplug handling
Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.
=======
[ 78.714569] Fallback order for Node 0: 0
[ 78.714575] Built 1 zonelists, mobility grouping on. Total pages:
1817886
[ 78.717133] Policy zone: Normal
[ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 80.056643] PEFILE: Unsigned PE binary
=======
The memory hotplug events are notified very quickly and very many, while
the handling of crash hotplug is much slower relatively. So the atomic
variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.
Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug
handling specifically. This doesn't impact the usage of __kexec_lock.
Link: https://lkml.kernel.org/r/20230926120905.392903-1-bhe@redhat.com
Fixes: 247262756121 ("crash: add generic infrastructure for crash hotplug support")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Juntong Deng [Tue, 26 Sep 2023 18:19:44 +0000 (02:19 +0800)]
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).
The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).
Error Example:
awk: not an option: -e
Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yang Shi [Wed, 20 Sep 2023 22:32:42 +0000 (15:32 -0700)]
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page. Then commit
6f4576e3687b ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone. The
return value problem was fixed by commit
a7f40cfe3b7a ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.
The code should conceptually do:
if (MPOL_MF_MOVE|MOVEALL)
scan all vmas
try to migrate the existing pages
return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
scan all vmas
try to migrate the existing pages
return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
check the ranges in test_walk, EFAULT without mbind_range() if discontig.
Fixed the behavior.
Link: https://lkml.kernel.org/r/20230920223242.3425775-1-yang@os.amperecomputing.com
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jinjie Ruan [Mon, 25 Sep 2023 07:20:59 +0000 (15:20 +0800)]
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
When CONFIG_DAMON_VADDR_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y
and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected.
Since commit
9f86d624292c ("mm/damon/vaddr-test: remove unnecessary
variables"), the damon_destroy_ctx() is removed, but still call
damon_new_target() and damon_new_region(), the damon_region which is
allocated by kmem_cache_alloc() in damon_new_region() and the damon_target
which is allocated by kmalloc in damon_new_target() are not freed. And
the damon_region which is allocated in damon_new_region() in
damon_set_regions() is also not freed.
So use damon_destroy_target to free all the damon_regions and damon_target.
unreferenced object 0xffff888107c9a940 (size 64):
comm "kunit_try_catch", pid 1069, jiffies
4294670592 (age 732.761s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b ............kkkk
60 c7 9c 07 81 88 ff ff f8 cb 9c 07 81 88 ff ff `...............
backtrace:
[<
ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<
ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<
ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<
ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff8881079cc740 (size 56):
comm "kunit_try_catch", pid 1069, jiffies
4294670592 (age 732.761s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<
ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<
ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
[<
ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107c9ac40 (size 64):
comm "kunit_try_catch", pid 1071, jiffies
4294670595 (age 732.843s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b ............kkkk
a0 cc 9c 07 81 88 ff ff 78 a1 76 07 81 88 ff ff ........x.v.....
backtrace:
[<
ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<
ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<
ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<
ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff8881079ccc80 (size 56):
comm "kunit_try_catch", pid 1071, jiffies
4294670595 (age 732.843s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<
ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<
ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
[<
ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107c9af40 (size 64):
comm "kunit_try_catch", pid 1073, jiffies
4294670597 (age 733.011s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b ............kkkk
20 a2 76 07 81 88 ff ff b8 a6 76 07 81 88 ff ff .v.......v.....
backtrace:
[<
ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<
ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<
ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<
ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810776a200 (size 56):
comm "kunit_try_catch", pid 1073, jiffies
4294670597 (age 733.011s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<
ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<
ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
[<
ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810776a740 (size 56):
comm "kunit_try_catch", pid 1073, jiffies
4294670597 (age 733.025s)
hex dump (first 32 bytes):
3d 00 00 00 00 00 00 00 3f 00 00 00 00 00 00 00 =.......?.......
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<
ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<
ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
[<
ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
[<
ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888108038240 (size 64):
comm "kunit_try_catch", pid 1075, jiffies
4294670600 (age 733.022s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 03 00 00 00 6b 6b 6b 6b ............kkkk
48 ad 76 07 81 88 ff ff 98 ae 76 07 81 88 ff ff H.v.......v.....
backtrace:
[<
ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<
ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<
ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<
ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810776ad28 (size 56):
comm "kunit_try_catch", pid 1075, jiffies
4294670600 (age 733.022s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<
ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<
ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
[<
ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
[<
ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
[<
ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<
ffffffff81237cf6>] kthread+0x2b6/0x380
[<
ffffffff81097add>] ret_from_fork+0x2d/0x70
[<
ffffffff81003791>] ret_from_fork_asm+0x11/0x20
Link: https://lkml.kernel.org/r/20230925072100.3725620-1-ruanjinjie@huawei.com
Fixes: 9f86d624292c ("mm/damon/vaddr-test: remove unnecessary variables")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Thu, 21 Sep 2023 07:38:29 +0000 (09:38 +0200)]
mm, memcg: reconsider kmem.limit_in_bytes deprecation
This reverts commits
86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and
partially reverts
58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit. Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1]. The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result. The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.
Now, reverting alone
86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file. Therefore we have to go and revisit
58056f77502f as
well. There are two ways to go ahead. Either we give up on the
deprecation and fully revert
58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.
Note to backporters to stable trees.
a8c49af3be5f ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore. Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).
[akpm@linux-foundation.org: fix build - remove unused local]
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Domenico Cerasuolo [Fri, 22 Sep 2023 17:22:11 +0000 (19:22 +0200)]
mm: zswap: fix potential memory corruption on duplicate store
While stress-testing zswap a memory corruption was happening when writing
back pages. __frontswap_store used to check for duplicate entries before
attempting to store a page in zswap, this was because if the store fails
the old entry isn't removed from the tree. This change removes duplicate
entries in zswap_store before the actual attempt.
[cerasuolodomenico@gmail.com: add a warning and a comment, per Johannes]
Link: https://lkml.kernel.org/r/20230925130002.1929369-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230922172211.1704917-1-cerasuolodomenico@gmail.com
Fixes: 42c06a0e8ebe ("mm: kill frontswap")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryan Roberts [Fri, 22 Sep 2023 11:58:04 +0000 (12:58 +0100)]
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
Arguably, the root cause is really due to commit
18f3962953e4 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above. While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs. And since then new
callers have come along which rely on this working. But given the
brokeness is only observable after commit
8a13897fb0da ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.
Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.
Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryan Roberts [Fri, 22 Sep 2023 11:58:03 +0000 (12:58 +0100)]
mm: hugetlb: add huge page size param to set_huge_pte_at()
Patch series "Fix set_huge_pte_at() panic on arm64", v2.
This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic. The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory. This test (and the uffd poison feature) was merged for
v6.5-rc7.
Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.
Description of Bug
==================
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():
static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
{
VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
return page_folio(pfn_to_page(swp_offset_pfn(entry)));
}
Fix
===
The simplest fix would have been to revert the dodgy cleanup commit
18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at(). As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.
So instead, I've added a huge page size parameter to set_huge_pte_at().
This means that the arm64 code has the size in all cases. It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.
I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64). I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.
This patch (of 2):
In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at(). Provide for this by
adding an `unsigned long sz` parameter to the function. This follows the
same pattern as huge_pte_clear().
This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc). The actual arm64 bug will be fixed in a separate
commit.
No behavioral changes intended.
Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> [vmalloc change]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Thu, 21 Sep 2023 18:12:36 +0000 (14:12 -0400)]
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
When updating the maple tree iterator to avoid rewalks, an issue was
introduced when shifting beyond the limits. This can be seen by trying to
go to the previous address of 0, which would set the maple node to
MAS_NONE and keep the range as the last entry.
Subsequent calls to mas_find() would then search upwards from mas->last
and skip the value at mas->index/mas->last. This showed up as a bug in
mprotect which skips the actual VMA at the current range after attempting
to go to the previous VMA from 0.
Since MAS_NONE may already be set when searching for a value that isn't
contained within a node, changing the handling of MAS_NONE in mas_find()
would make the code more complicated and error prone. Furthermore, there
was no way to tell which limit was hit, and thus which action to take
(next or the entry at the current range).
This solution is to add two states to track what happened with the
previous iterator action. This allows for the expected behaviour of the
next command to return the correct item (either the item at the range
requested, or the next/previous).
Tests are also added and updated accordingly.
Link: https://lkml.kernel.org/r/20230921181236.509072-3-Liam.Howlett@oracle.com
Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Link: https://lore.kernel.org/linux-mm/20230921181236.509072-1-Liam.Howlett@oracle.com/
Fixes: 39193685d585 ("maple_tree: try harder to keep active node with mas_prev()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Closes: https://bugs.archlinux.org/task/79656
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Thu, 21 Sep 2023 18:12:35 +0000 (14:12 -0400)]
maple_tree: add mas_is_active() to detect in-tree walks
Patch series "maple_tree: Fix mas_prev() state regression".
Pedro Falcato retported an mprotect regression [1] which was bisected back
to the iterator changes for maple tree. Root cause analysis showed the
mas_prev() running off the end of the VMA space (previous from 0) followed
by mas_find(), would skip the first value.
This patchset introduces maple state underflow/overflow so the sequence of
calls on the maple state will return what the user expects.
Users who encounter this bug may see mprotect(), userfaultfd_register(),
and mlock() fail on VMAs mapped with address 0.
This patch (of 2):
Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.
Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pan Bian [Thu, 21 Sep 2023 14:17:31 +0000 (23:17 +0900)]
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails. If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed. However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug. This patch moves the release
operation after unlocking and putting the page.
NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed. However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.
[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e89 ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 20 Sep 2023 04:09:58 +0000 (05:09 +0100)]
mm: abstract moving to the next PFN
In order to fix the L1TF vulnerability, x86 can invert the PTE bits for
PROT_NONE VMAs, which means we cannot move from one PTE to the next by
adding 1 to the PFN field of the PTE. This results in the BUG reported at
[1].
Abstract advancing the PTE to the next PFN through a pte_next_pfn()
function/macro.
Link: https://lkml.kernel.org/r/20230920040958.866520-1-willy@infradead.org
Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000d099fa0604f03351@google.com [1]
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 20 Sep 2023 03:53:35 +0000 (04:53 +0100)]
mm: report success more often from filemap_map_folio_range()
Even though we had successfully mapped the relevant page, we would rarely
return success from filemap_map_folio_range(). That leads to falling back
from the VMA lock path to the mmap_lock path, which is a speed &
scalability issue. Found by inspection.
Link: https://lkml.kernel.org/r/20230920035336.854212-1-willy@infradead.org
Fixes: 617c28ecab22 ("filemap: batch PTE mappings")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Greg Ungerer [Thu, 7 Sep 2023 01:18:08 +0000 (11:18 +1000)]
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example). The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).
On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary. This
matters since start_thread() will set the ARM CPSR register as required
based on this flag. If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.
Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it. This macro in
the generic case sets PER_LINUX and preserves the upper bytes.
Architectures can override this for their specific use case, and ARM does
exactly this.
The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware. If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.
Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23547 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Fri, 29 Sep 2023 23:51:38 +0000 (16:51 -0700)]
Merge tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Two SMB3 server fixes for null pointer dereferences:
- invalid SMB3 request case (fixes issue found in testing the read
compound patch)
- iovec error case in response processing"
* tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: check iov vector index in ksmbd_conn_write()
ksmbd: return invalid parameter error response if smb2 request is invalid
Linus Torvalds [Fri, 29 Sep 2023 23:46:24 +0000 (16:46 -0700)]
Merge tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A series that fixes an involved 'double watch error' deadlock in RBD
marked for stable and two cleanups"
* tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client:
rbd: take header_rwsem in rbd_dev_refresh() only when updating
rbd: decouple parent info read-in from updating rbd_dev
rbd: decouple header read-in from updating rbd_dev->header
rbd: move rbd_dev_refresh() definition
Revert "ceph: make members in struct ceph_mds_request_args_ext a union"
ceph: remove unnecessary check for NULL in parse_longname()
Linus Torvalds [Fri, 29 Sep 2023 23:41:25 +0000 (16:41 -0700)]
Merge tag 'xfs-6.6-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fix from Chandan Babu:
- fix for commit
68b957f64fca ("xfs: load uncached unlinked inodes into
memory on demand") which address review comments provided by Dave
Chinner
* tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix reloading entire unlinked bucket lists
Uwe Kleine-König [Thu, 28 Sep 2023 07:06:52 +0000 (09:06 +0200)]
MAINTAINERS: Fix Florian Fainelli's email address
Commit
31345a0f5901 ("MAINTAINERS: Replace my email address") added 13
instances of ...@broadcom.com and one of only ...@broadcom. I didn't
double check if Broadcom really owns that TLD, but git send-email
doesn't accept it, so add ".com" to that one bogous(?) instance.
Fixes: 31345a0f5901 ("MAINTAINERS: Replace my email address")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Fri, 29 Sep 2023 20:38:34 +0000 (13:38 -0700)]
Merge tag 'ata-6.6-rc4' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
"A larger than usual set of fixes for 6.6-rc4 due to the unexpected
number of fixes needed to address ATA disks suspend/resume issues.
In more detail:
- Add missing additionalProperties on child nodes to the pata-common
DT bindings (Rob)
- Fix handling of the REPORT SUPPORTED OPERATION CODES command to
ignore reserved bits (Niklas)
- Increase port multiplier soft reset timeout to accomodate slow
devices and avoid issues on wakeup (Matthias)
- A couple of minor code fixes to avoid compilation warnings in
libata-core and libata-eh (me)
- Many patches from me to address suspend/resume issues, and in
particular a potential deadlock on resume due to the SCSI disk
driver resume operation not being synchronized with libata EH port
resume handling.
This is addressed by changing the scsi disk driver disk start/stop
control to allow libata to execute disk suspend (spin down) and
resume (spin up) on its own during system suspend/resume. Runtime
suspend/resume control remains with the SCSI disk driver.
Other fixes include:
- Fix libata power management request issuing to avoid races
- Establish a link between ATA ports and SCSI devices to order PM
operations
- Fix device removal to avoid issues with driver rmmod removal
- Fix synchronization of libata device rescan and SCSI disk resume
operation
- Remove libsas PM operations as suspend/resume is handled
directly by the sas controller resume
- Fix the SCSI disk driver to not issue commands to suspended
disks, thus avoiding potential system lock-up on resume"
* tag 'ata-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-eh: Fix compilation warning in ata_eh_link_report()
ata: libata-core: Fix compilation warning in ata_dev_config_ncq()
scsi: sd: Do not issue commands to suspended disks on shutdown
ata: libata-core: Do not register PM operations for SAS ports
ata: libata-scsi: Fix delayed scsi_rescan_device() execution
scsi: Do not attempt to rescan suspended devices
ata: libata-scsi: Disable scsi device manage_system_start_stop
scsi: sd: Differentiate system and runtime start/stop management
ata: libata-scsi: link ata port and scsi device
ata: libata-core: Fix port and device removal
ata: libata-core: Fix ata_port_request_pm() locking
ata: libata-sata: increase PMP SRST timeout to 10s
ata: libata-scsi: ignore reserved bits for REPORT SUPPORTED OPERATION CODES
dt-bindings: ata: pata-common: Add missing additionalProperties on child nodes
Linus Torvalds [Fri, 29 Sep 2023 20:28:49 +0000 (13:28 -0700)]
Merge tag 'block-6.6-2023-09-28' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just two minor comment / documentation fixes for the block side"
* tag 'block-6.6-2023-09-28' of git://git.kernel.dk/linux:
block: fix kernel-doc for disk_force_media_change()
block: correct stale comment in rq_qos_wait
Linus Torvalds [Fri, 29 Sep 2023 19:56:34 +0000 (12:56 -0700)]
Merge tag 'io_uring-6.6-2023-09-28' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"A single fix going to stable for the IORING_OP_LINKAT flag handling"
* tag 'io_uring-6.6-2023-09-28' of git://git.kernel.dk/linux:
io_uring/fs: remove sqe->rw_flags checking from LINKAT
Linus Torvalds [Fri, 29 Sep 2023 19:10:12 +0000 (12:10 -0700)]
Merge tag 'slab-fixes-for-6.6-rc4' of git://git./linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- stable fix to prevent list corruption when destroying caches with
leftover objects (Rafael Aquini)
- fix for a gotcha in kmalloc_size_roundup() when calling it with too
high size, discovered when recently a networking call site had to be
fixed for a different issue (David Laight)
* tag 'slab-fixes-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: kmalloc_size_roundup() must not return 0 for non-zero size
mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy()