Chengming Zhou [Sun, 20 Feb 2022 05:14:25 +0000 (13:14 +0800)]
 
sched/cpuacct: Optimize away RCU read lock
Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.
And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com
Chengming Zhou [Sun, 20 Feb 2022 05:14:24 +0000 (13:14 +0800)]
 
sched/cpuacct: Fix charge percpu cpuusage
The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.
But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.
Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com
Ingo Molnar [Mon, 21 Feb 2022 10:53:51 +0000 (11:53 +0100)]
 
Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts
New conflicts in sched/core due to the following upstream fixes:
  
44585f7bc0cb ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n")
  
a06247c6804f ("psi: Fix uaf issue when psi trigger is destroyed while being polled")
Conflicts:
	include/linux/psi_types.h
	kernel/sched/psi.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 20 Feb 2022 21:07:20 +0000 (13:07 -0800)]
 
Linux 5.17-rc5
Linus Torvalds [Sun, 20 Feb 2022 20:50:50 +0000 (12:50 -0800)]
 
Merge tag 'locking_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
 "Fix a NULL ptr dereference when dumping lockdep chains through
  /proc/lockdep_chains"
* tag 'locking_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Correct lock_classes index mapping
Linus Torvalds [Sun, 20 Feb 2022 20:46:21 +0000 (12:46 -0800)]
 
Merge tag 'x86_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
 - Fix the ptrace regset xfpregs_set() callback to behave according to
   the ABI
 - Handle poisoned pages properly in the SGX reclaimer code
* tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
  x86/sgx: Fix missing poison handling in reclaimer
Linus Torvalds [Sun, 20 Feb 2022 20:40:20 +0000 (12:40 -0800)]
 
Merge tag 'sched_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
 "Fix task exposure order when forking tasks"
* tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix yet more sched_fork() races
Linus Torvalds [Sun, 20 Feb 2022 20:04:14 +0000 (12:04 -0800)]
 
Merge tag 'edac_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
 "Fix a long-standing struct alignment bug in the EDAC struct allocation
  code"
* tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
Linus Torvalds [Sun, 20 Feb 2022 19:51:49 +0000 (11:51 -0800)]
 
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Three fixes, all in drivers.
  The ufs and qedi fixes are minor; the lpfc one is a bit bigger because
  it involves adding a heuristic to detect and deal with common but not
  standards compliant behaviour"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Fix divide by zero in ufshcd_map_queues()
  scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
  scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()
Linus Torvalds [Sun, 20 Feb 2022 19:30:18 +0000 (11:30 -0800)]
 
Merge tag 'dmaengine-fix-5.17' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
 "A bunch of driver fixes for:
   - ptdma error handling in init
   - lock fix in at_hdmac
   - error path and error num fix for sh dma
   - pm balance fix for stm32"
* tag 'dmaengine-fix-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: shdma: Fix runtime PM imbalance on error
  dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
  dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
  dmaengine: sh: rcar-dmac: Check for error num after setting mask
  dmaengine: at_xdmac: Fix missing unlock in at_xdmac_tasklet()
  dmaengine: ptdma: Fix the error handling path in pt_core_init()
Linus Torvalds [Sun, 20 Feb 2022 19:23:48 +0000 (11:23 -0800)]
 
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Some driver updates, a MAINTAINERS fix, and additions to COMPILE_TEST
  (so we won't miss build problems again)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: remove duplicate entry for i2c-qcom-geni
  i2c: brcmstb: fix support for DSL and CM variants
  i2c: qup: allow COMPILE_TEST
  i2c: imx: allow COMPILE_TEST
  i2c: cadence: allow COMPILE_TEST
  i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
  i2c: qcom-cci: don't delete an unregistered adapter
  i2c: bcm2835: Avoid clock stretching timeouts
Linus Torvalds [Sun, 20 Feb 2022 19:15:46 +0000 (11:15 -0800)]
 
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 - a fix for Synaptics touchpads in RMI4 mode failing to suspend/resume
   properly because I2C client devices are now being suspended and
   resumed asynchronously which changed the ordering
 - a change to make sure we do not set right and middle buttons
   capabilities on touchpads that are "buttonpads" (i.e. do not have
   separate physical buttons)
 - a change to zinitix touchscreen driver adding more compatible
   strings/IDs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: psmouse - set up dependency between PS/2 and SMBus companions
  Input: zinitix - add new compatible strings
  Input: clear BTN_RIGHT/MIDDLE on buttonpads
Linus Torvalds [Sun, 20 Feb 2022 19:07:46 +0000 (11:07 -0800)]
 
Merge tag 'for-v5.17-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
 "Three regression fixes for the 5.17 cycle:
   - build warning fix for power-supply documentation
   - pointer size fix in cw2015 battery driver
   - OOM handling in bq256xx charger driver"
* tag 'for-v5.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: bq256xx: Handle OOM correctly
  power: supply: core: fix application of sizeof to pointer
  power: supply: fix table problem in sysfs-class-power
Linus Torvalds [Sun, 20 Feb 2022 19:01:47 +0000 (11:01 -0800)]
 
Merge tag 'fs.mount_setattr.v5.17-rc4' of git://git./linux/kernel/git/brauner/linux
Pull mount_setattr test/doc fixes from Christian Brauner:
 "This contains a fix for one of the selftests for the mount_setattr
  syscall to create idmapped mounts, an entry for idmapped mounts for
  maintainers, and missing kernel documentation for the helper we split
  out some time ago to get and yield write access to a mount when
  changing mount properties"
* tag 'fs.mount_setattr.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: add kernel doc for mnt_{hold,unhold}_writers()
  MAINTAINERS: add entry for idmapped mounts
  tests: fix idmapped mount_setattr test
Linus Torvalds [Sun, 20 Feb 2022 18:55:05 +0000 (10:55 -0800)]
 
Merge tag 'pidfd.v5.17-rc4' of git://git./linux/kernel/git/brauner/linux
Pull pidfd fix from Christian Brauner:
 "This fixes a problem reported by lockdep when installing a pidfd via
  fd_install() with siglock and the tasklisk write lock held in
  copy_process() when calling clone()/clone3() with CLONE_PIDFD.
  Originally a pidfd was created prior to holding any of these locks but
  this required a call to ksys_close(). So quite some time ago in
  
6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups") we
  switched to a get_unused_fd_flags() + fd_install() model.
  As part of that we moved fd_install() as late as possible. This was
  done for two main reasons. First, because we needed to ensure that we
  call fd_install() past the point of no return as once that's called
  the fd is live in the task's file table. Second, because we tried to
  ensure that the fd is visible in /proc/<pid>/fd/<pidfd> right when the
  task is visible.
  This fix moves the fd_install() to an even later point which means
  that a task will be visible in proc while the pidfd isn't yet under
  /proc/<pid>/fd/<pidfd>.
  While this is a user visible change it's very unlikely that this will
  have any impact. Nobody should be relying on that and if they do we
  need to come up with something better but again, it's doubtful this is
  relevant"
* tag 'pidfd.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  copy_process(): Move fd_install() out of sighand->siglock critical section
Linus Torvalds [Sun, 20 Feb 2022 18:44:11 +0000 (10:44 -0800)]
 
Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull ucounts fixes from Eric Biederman:
 "Michal Koutný recently found some bugs in the enforcement of
  RLIMIT_NPROC in the recent ucount rlimit implementation.
  In this set of patches I have developed a very conservative approach
  changing only what is necessary to fix the bugs that I can see
  clearly. Cleanups and anything that is making the code more consistent
  can follow after we have the code working as it has historically.
  The problem is not so much inconsistencies (although those exist) but
  that it is very difficult to figure out what the code should be doing
  in the case of RLIMIT_NPROC.
  All other rlimits are only enforced where the resource is acquired
  (allocated). RLIMIT_NPROC by necessity needs to be enforced in an
  additional location, and our current implementation stumbled it's way
  into that implementation"
* 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Handle wrapping in is_ucounts_overlimit
  ucounts: Move RLIMIT_NPROC handling after set_user
  ucounts: Base set_cred_ucounts changes on the real user
  ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
  rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
Wolfram Sang [Fri, 18 Feb 2022 10:49:04 +0000 (11:49 +0100)]
 
MAINTAINERS: remove duplicate entry for i2c-qcom-geni
The driver is already covered in the ARM/QUALCOMM section. Also, Akash
Asthana's email bounces meanwhile and Mukesh Savaliya has never
responded to mails regarding this driver.
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Mark Rutland [Mon, 14 Feb 2022 16:52:16 +0000 (16:52 +0000)]
 
arm64: Support PREEMPT_DYNAMIC
This patch enables support for PREEMPT_DYNAMIC on arm64, allowing the
preemption model to be chosen at boot time.
Specifically, this patch selects HAVE_PREEMPT_DYNAMIC_KEY, so that each
preemption function is an out-of-line call with an early return
depending upon a static key. This leaves almost all the codegen up to
the compiler, and side-steps a number of pain points with static calls
(e.g. interaction with CFI schemes). This should have no worse overhead
than using non-inline static calls, as those use out-of-line trampolines
with early returns.
For example, the dynamic_cond_resched() wrapper looks as follows when
enabled. When disabled, the first `B` is replaced with a `NOP`,
resulting in an early return.
| <dynamic_cond_resched>:
|        bti     c
|        b       <dynamic_cond_resched+0x10>     // or `nop`
|        mov     w0, #0x0
|        ret
|        mrs     x0, sp_el0
|        ldr     x0, [x0, #8]
|        cbnz    x0, <dynamic_cond_resched+0x8>
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret
... compared to the regular form of the function:
| <__cond_resched>:
|        bti     c
|        mrs     x0, sp_el0
|        ldr     x1, [x0, #8]
|        cbz     x1, <__cond_resched+0x18>
|        mov     w0, #0x0
|        ret
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret
Since arm64 does not yet use the generic entry code, we must define our
own `sk_dynamic_irqentry_exit_cond_resched`, which will be
enabled/disabled by the common code in kernel/sched/core.c. All other
preemption functions and associated static keys are defined there.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-8-mark.rutland@arm.com
Mark Rutland [Mon, 14 Feb 2022 16:52:15 +0000 (16:52 +0000)]
 
arm64: entry: Centralize preemption decision
For historical reasons, the decision of whether or not to preempt is
spread across arm64_preempt_schedule_irq() and __el1_irq(), and it would
be clearer if this were all in one place.
Also, arm64_preempt_schedule_irq() calls lockdep_assert_irqs_disabled(),
but this is redundant, as we have a subsequent identical assertion in
__exit_to_kernel_mode(), and preempt_schedule_irq() will
BUG_ON(!irqs_disabled()) anyway.
This patch removes the redundant assertion and centralizes the
preemption decision making within arm64_preempt_schedule_irq().
Other than the slight change to assertion behaviour, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-7-mark.rutland@arm.com
Mark Rutland [Mon, 14 Feb 2022 16:52:14 +0000 (16:52 +0000)]
 
sched/preempt: Add PREEMPT_DYNAMIC using static keys
Where an architecture selects HAVE_STATIC_CALL but not
HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
which will either branch to a callee or return to the caller.
On such architectures, a number of constraints can conspire to make
those trampolines more complicated and potentially less useful than we'd
like. For example:
* Hardware and software control flow integrity schemes can require the
  addition of "landing pad" instructions (e.g. `BTI` for arm64), which
  will also be present at the "real" callee.
* Limited branch ranges can require that trampolines generate or load an
  address into a register and perform an indirect branch (or at least
  have a slow path that does so). This loses some of the benefits of
  having a direct branch.
* Interaction with SW CFI schemes can be complicated and fragile, e.g.
  requiring that we can recognise idiomatic codegen and remove
  indirections understand, at least until clang proves more helpful
  mechanisms for dealing with this.
For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
really only need to enable/disable specific preemption functions. We can
achieve the same effect without a number of the pain points above by
using static keys to fold early returns into the preemption functions
themselves rather than in an out-of-line trampoline, effectively
inlining the trampoline into the start of the function.
For arm64, this results in good code generation. For example, the
dynamic_cond_resched() wrapper looks as follows when enabled. When
disabled, the first `B` is replaced with a `NOP`, resulting in an early
return.
| <dynamic_cond_resched>:
|        bti     c
|        b       <dynamic_cond_resched+0x10>     // or `nop`
|        mov     w0, #0x0
|        ret
|        mrs     x0, sp_el0
|        ldr     x0, [x0, #8]
|        cbnz    x0, <dynamic_cond_resched+0x8>
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret
... compared to the regular form of the function:
| <__cond_resched>:
|        bti     c
|        mrs     x0, sp_el0
|        ldr     x1, [x0, #8]
|        cbz     x1, <__cond_resched+0x18>
|        mov     w0, #0x0
|        ret
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret
Any architecture which implements static keys should be able to use this
to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
calls. Since this is likely to have greater overhead than (inlined)
static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
HAVE_PREEMPT_DYNAMIC_CALL is selected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com
Mark Rutland [Mon, 14 Feb 2022 16:52:13 +0000 (16:52 +0000)]
 
sched/preempt: Decouple HAVE_PREEMPT_DYNAMIC from GENERIC_ENTRY
Now that the enabled/disabled states for the preemption functions are
declared alongside their definitions, the core PREEMPT_DYNAMIC logic is
no longer tied to GENERIC_ENTRY, and can safely be selected so long as
an architecture provides enabled/disabled states for
irqentry_exit_cond_resched().
Make it possible to select HAVE_PREEMPT_DYNAMIC without GENERIC_ENTRY.
For existing users of HAVE_PREEMPT_DYNAMIC there should be no functional
change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-5-mark.rutland@arm.com
Mark Rutland [Mon, 14 Feb 2022 16:52:12 +0000 (16:52 +0000)]
 
sched/preempt: Simplify irqentry_exit_cond_resched() callers
Currently callers of irqentry_exit_cond_resched() need to be aware of
whether the function should be indirected via a static call, leading to
ugly ifdeffery in callers.
Save them the hassle with a static inline wrapper that does the right
thing. The raw_irqentry_exit_cond_resched() will also be useful in
subsequent patches which will add conditional wrappers for preemption
functions.
Note: in arch/x86/entry/common.c, xen_pv_evtchn_do_upcall() always calls
irqentry_exit_cond_resched() directly, even when PREEMPT_DYNAMIC is in
use. I believe this is a latent bug (which this patch corrects), but I'm
not entirely certain this wasn't deliberate.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-4-mark.rutland@arm.com
Mark Rutland [Mon, 14 Feb 2022 16:52:11 +0000 (16:52 +0000)]
 
sched/preempt: Refactor sched_dynamic_update()
Currently sched_dynamic_update needs to open-code the enabled/disabled
function names for each preemption model it supports, when in practice
this is a boolean enabled/disabled state for each function.
Make this clearer and avoid repetition by defining the enabled/disabled
states at the function definition, and using helper macros to perform the
static_call_update(). Where x86 currently overrides the enabled
function, it is made to provide both the enabled and disabled states for
consistency, with defaults provided by the core code otherwise.
In subsequent patches this will allow us to support PREEMPT_DYNAMIC
without static calls.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-3-mark.rutland@arm.com
Mark Rutland [Mon, 14 Feb 2022 16:52:10 +0000 (16:52 +0000)]
 
sched/preempt: Move PREEMPT_DYNAMIC logic later
The PREEMPT_DYNAMIC logic in kernel/sched/core.c patches static calls
for a bunch of preemption functions. While most are defined prior to
this, the definition of cond_resched() is later in the file, and so we
only have its declarations from include/linux/sched.h.
In subsequent patches we'd like to define some macros alongside the
definition of each of the preemption functions, which we can use within
sched_dynamic_update(). For this to be possible, the PREEMPT_DYNAMIC
logic needs to be placed after the various preemption functions.
As a preparatory step, this patch moves the PREEMPT_DYNAMIC logic after
the various preemption functions, with no other changes -- this is
purely a move.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-2-mark.rutland@arm.com
Peter Zijlstra [Mon, 14 Feb 2022 09:16:57 +0000 (10:16 +0100)]
 
sched: Fix yet more sched_fork() races
Where commit 
4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.
Commit 
13765de8148f ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.
Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
Linus Torvalds [Sat, 19 Feb 2022 00:24:44 +0000 (16:24 -0800)]
 
Merge tag 'nfs-for-5.17-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
 - Fix unnecessary changeattr revalidations
 - Fix resolving symlinks during directory lookups
 - Don't report writeback errors in nfs_getattr()
* tag 'nfs-for-5.17-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Do not report writeback errors in nfs_getattr()
  NFS: LOOKUP_DIRECTORY is also ok with symlinks
  NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()
Linus Torvalds [Sat, 19 Feb 2022 00:19:14 +0000 (16:19 -0800)]
 
Merge tag 'acpi-5.17-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These make an excess warning message go away and fix a recently
  introduced boot failure on a vintage machine.
  Specifics:
   - Change the log level of the "table not found" message in
     acpi_table_parse_entries_array() to debug to prevent it from
     showing up in the logs unnecessarily (Dan Williams)
   - Add a C-state limit quirk for 32-bit ThinkPad T40 to prevent it
     from crashing on boot after recent changes in the ACPI processor
     driver (Woody Suwalski)"
* tag 'acpi-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
  ACPI: tables: Quiet ACPI table not found warning
Linus Torvalds [Sat, 19 Feb 2022 00:14:13 +0000 (16:14 -0800)]
 
Merge tag 'riscv-for-linus-5.17-rc5' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
 "A set of three fixes, all aimed at fixing some fallout from the recent
  sparse hart ID support"
* tag 'riscv-for-linus-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix IPI/RFENCE hmask on non-monotonic hartid ordering
  RISC-V: Fix handling of empty cpu masks
  RISC-V: Fix hartid mask handling for hartid 31 and up
Dmitry Torokhov [Tue, 15 Feb 2022 21:32:26 +0000 (13:32 -0800)]
 
Input: psmouse - set up dependency between PS/2 and SMBus companions
When we switch from emulated PS/2 to native (RMI4 or Elan) protocols, we
create SMBus companion devices that are attached to I2C/SMBus controllers.
However, when suspending and resuming, we also need to make sure that we
take into account the PS/2 device they are associated with, so that PS/2
device is suspended after the companion and resumed before it, otherwise
companions will not work properly. Before I2C devices were marked for
asynchronous suspend/resume, this ordering happened naturally, but now we
need to enforce it by establishing device links, with PS/2 devices being
suppliers and SMBus companions being consumers.
Fixes: 172d931910e1 ("i2c: enable async suspend/resume on i2c client devices")
Reported-and-tested-by: Hugh Dickins <hughd@google.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/89456fcd-a113-4c82-4b10-a9bcaefac68f@google.com
Link: https://lore.kernel.org/r/YgwQN8ynO88CPMju@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Rafael J. Wysocki [Fri, 18 Feb 2022 18:36:36 +0000 (19:36 +0100)]
 
Merge branch 'acpi-processor'
Merge fix for a recent boot lockup regression on 32-bit ThinkPad T40.
* acpi-processor:
  ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
Linus Torvalds [Fri, 18 Feb 2022 17:33:23 +0000 (09:33 -0800)]
 
Merge tag 'mtd/fixes-for-5.17-rc5' of git://git./linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
 "MTD changes:
   - Qcom:
      - Don't print error message on -EPROBE_DEFER
      - Fix kernel panic on skipped partition
      - Fix missing free for pparts in cleanup
   - phram: Prevent divide by zero bug in phram_setup()
  Raw NAND controller changes:
   - ingenic: Fix missing put_device in ingenic_ecc_get
   - qcom: Fix clock sequencing in qcom_nandc_probe()
   - omap2: Prevent invalid configuration and build error
   - gpmi: Don't leak PM reference in error path
   - brcmnand: Fix incorrect sub-page ECC status"
* tag 'mtd/fixes-for-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
  mtd: rawnand: gpmi: don't leak PM reference in error path
  mtd: phram: Prevent divide by zero bug in phram_setup()
  mtd: rawnand: omap2: Prevent invalid configuration and build error
  mtd: parsers: qcom: Fix missing free for pparts in cleanup
  mtd: parsers: qcom: Fix kernel panic on skipped partition
  mtd: parsers: qcom: Don't print error message on -EPROBE_DEFER
  mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
  mtd: rawnand: ingenic: Fix missing put_device in ingenic_ecc_get
Linus Torvalds [Fri, 18 Feb 2022 17:27:10 +0000 (09:27 -0800)]
 
Merge tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 - Surprise removal fix (Christoph)
 - Ensure that pages are zeroed before submitted for userspace IO
   (Haimin)
 - Fix blk-wbt accounting issue with BFQ (Laibin)
 - Use bsize for discard granularity in loop (Ming)
 - Fix missing zone handling in blk_complete_request() (Pankaj)
* tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block:
  block/wbt: fix negative inflight counter when remove scsi device
  block: fix surprise removal for drivers calling blk_set_queue_dying
  block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
  block: loop:use kstatfs.f_bsize of backing file to set discard granularity
  block: Add handling for zone append command in blk_complete_request
Linus Torvalds [Fri, 18 Feb 2022 17:20:52 +0000 (09:20 -0800)]
 
Merge tag 'sound-5.17-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A collection of small patches, mostly for old and new regressions and
  device-specific fixes.
   - Regression fixes regarding ALSA core SG-buffer helpers
   - Regression fix for Realtek HD-audio mutex deadlock
   - Regression fix for USB-audio PM resume error
   - More coverage of ASoC core control API notification fixes
   - Old regression fixes for HD-audio probe mask
   - Fixes for ASoC Realtek codec work handling
   - Other device-specific quirks / fixes"
* tag 'sound-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ASoC: intel: skylake: Set max DMA segment size
  ASoC: SOF: hda: Set max DMA segment size
  ALSA: hda: Set max DMA segment size
  ALSA: hda/realtek: Fix deadlock by COEF mutex
  ALSA: usb-audio: Don't abort resume upon errors
  ALSA: hda: Fix missing codec probe on Shenker Dock 15
  ALSA: hda: Fix regression on forced probe mask option
  ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
  ALSA: usb-audio: revert to IMPLICIT_FB_FIXED_DEV for M-Audio FastTrack Ultra
  ASoC: wm_adsp: Correct control read size when parsing compressed buffer
  ASoC: qcom: Actually clear DMA interrupt register for HDMI
  ALSA: memalloc: invalidate SG pages before sync
  ALSA: memalloc: Fix dma_need_sync() checks
  MAINTAINERS: update cros_ec_codec maintainers
  ASoC: rt5682: do not block workqueue if card is unbound
  ASoC: rt5668: do not block workqueue if card is unbound
  ASoC: rt5682s: do not block workqueue if card is unbound
  ASoC: tas2770: Insert post reset delay
  ASoC: Revert "ASoC: mediatek: Check for error clk pointer"
  ASoC: amd: acp: Set gpio_spkr_en to None for max speaker amplifer in machine driver
  ...
Linus Torvalds [Fri, 18 Feb 2022 17:14:19 +0000 (09:14 -0800)]
 
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
 "Fix wrong branch label in the EL2 GICv3 initialisation code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Correct wrong label in macro __init_el2_gicv3
Linus Torvalds [Fri, 18 Feb 2022 17:10:14 +0000 (09:10 -0800)]
 
Merge tag 'powerpc-5.17-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 - Fix boot failure on 603 with DEBUG_PAGEALLOC and KFENCE
 - Fix 32-build with newer binutils that rejects 'ptesync' etc
Thanks to Anders Roxell, Christophe Leroy, and Maxime Bizon.
* tag 'powerpc-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/lib/sstep: fix 'ptesync' build error
  powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
Linus Torvalds [Fri, 18 Feb 2022 17:04:27 +0000 (09:04 -0800)]
 
Merge tag '5.17-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Six small smb3 client fixes, three for stable:
   - fix for snapshot mount option
   - two ACL related fixes
   - use after free race fix
   - fix for confusing warning message logged with older dialects"
* tag '5.17-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix confusing unneeded warning message on smb2.1 and earlier
  cifs: modefromsids must add an ACE for authenticated users
  cifs: fix double free race when mount fails in cifs_get_root()
  cifs: do not use uninitialized data in the owner/group sid
  cifs: fix set of group SID via NTSD xattrs
  smb3: fix snapshot mount option
Andy Lutomirski [Mon, 14 Feb 2022 12:05:49 +0000 (13:05 +0100)]
 
x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
xfpregs_set() handles 32-bit REGSET_XFP and 64-bit REGSET_FP. The actual
code treats these regsets as modern FX state (i.e. the beginning part of
XSTATE). The declarations of the regsets thought they were the legacy
i387 format. The code thought they were the 32-bit (no xmm8..15) variant
of XSTATE and, for good measure, made the high bits disappear by zeroing
the wrong part of the buffer. The latter broke ptrace, and everything
else confused anyone trying to understand the code. In particular, the
nonsense definitions of the regsets confused me when I wrote this code.
Clean this all up. Change the declarations to match reality (which
shouldn't change the generated code, let alone the ABI) and fix
xfpregs_set() to clear the correct bits and to only do so for 32-bit
callers.
Fixes: 6164331d15f7 ("x86/fpu: Rewrite xfpregs_set()")
Reported-by: Luís Ferreira <contact@lsferreira.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215524
Link: https://lore.kernel.org/r/YgpFnZpF01WwR8wU@zn.tnic
Rafał Miłecki [Tue, 15 Feb 2022 07:27:35 +0000 (08:27 +0100)]
 
i2c: brcmstb: fix support for DSL and CM variants
DSL and CM (Cable Modem) support 8 B max transfer size and have a custom
DT binding for that reason. This driver was checking for a wrong
"compatible" however which resulted in an incorrect setup.
Fixes: e2e5a2c61837 ("i2c: brcmstb: Adding support for CM and DSL SoCs")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Linus Torvalds [Thu, 17 Feb 2022 23:21:42 +0000 (15:21 -0800)]
 
Merge tag 'linux-kselftest-fixes-5.17-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
 "Fixes to ftrace, exec, and seccomp tests build, run-time and install
  bugs. These bugs are in the way of running the tests"
* tag 'linux-kselftest-fixes-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT
  selftests/seccomp: Fix seccomp failure by adding missing headers
  selftests/exec: Add non-regular to TEST_GEN_PROGS
Linus Torvalds [Thu, 17 Feb 2022 21:11:46 +0000 (13:11 -0800)]
 
Merge tag 'drm-fixes-2022-02-18' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Regular fixes for rc5, nothing really stands out, mostly some amdgpu
  and i915 fixes with mediatek, radeon and some misc fixes.
  cma-helper:
   - set VM_DONTEXPAND
  atomic:
   - error handling fix
  mediatek:
   - fix probe defer loop with external bridge
  amdgpu:
   - Stable pstate clock fixes for Dimgrey Cavefish and Beige Goby
   - S0ix SDMA fix
   - Yellow Carp GPU reset fix
  radeon:
   - Backlight fix for iMac 12,1
  i915:
   - GVT kerneldoc cleanup.
   - GVT Kconfig should depend on X86
   - Prevent out of range access in SWSCI display code
   - Fix mbus join and dbuf slice config lookup
   - Fix inverted priority selection in the TTM backend
   - Fix FBC plane end Y offset check"
* tag 'drm-fixes-2022-02-18' of git://anongit.freedesktop.org/drm/drm:
  drm/atomic: Don't pollute crtc_state->mode_blob with error pointers
  drm/radeon: Fix backlight control on iMac 12,1
  drm/amd/pm: correct the sequence of sending gpu reset msg
  drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
  drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby
  drm/i915/fbc: Fix the plane end Y offset check
  drm/i915/opregion: check port number bounds for SWSCI display power state
  drm/i915/ttm: tweak priority hint selection
  drm/i915: Fix mbus join config lookup
  drm/i915: Fix dbuf slice config lookup
  drm/cma-helper: Set VM_DONTEXPAND for mmap
  drm/mediatek: mtk_dsi: Avoid EPROBE_DEFER loop with external bridge
  drm/i915/gvt: Make DRM_I915_GVT depend on X86
  drm/i915/gvt: clean up kernel-doc in gtt.c
Dave Airlie [Thu, 17 Feb 2022 19:44:44 +0000 (05:44 +1000)]
 
Merge tag 'drm-intel-fixes-2022-02-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- GVT kerneldoc cleanup. (Randy Dunlap)
- GVT Kconfig should depend on X86. (Siva Mullati)
- Prevent out of range access in SWSCI display code. (Jani Nikula)
- Fix mbus join and dbuf slice config lookup. (Ville Syrjälä)
- Fix inverted priority selection in the TTM backend. (Matthew Auld)
- Fix FBC plane end Y offset check. (Ville Syrjälä)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Yg4lA6k8+xp8u3aB@tursulin-mobl2
Dave Airlie [Thu, 17 Feb 2022 19:39:53 +0000 (05:39 +1000)]
 
Merge tag 'drm-misc-fixes-2022-02-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
 * drm/cma-helper: Set VM_DONTEXPAND
 * drm/atomic: Fix error handling in drm_atomic_set_mode_for_crtc()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Yg4mzQALMX69UmA3@linux-uq9g
Linus Torvalds [Thu, 17 Feb 2022 19:33:59 +0000 (11:33 -0800)]
 
Merge tag 'net-5.17-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.
  Current release - regressions:
   - dsa: lantiq_gswip: fix use after free in gswip_remove()
   - smc: avoid overwriting the copies of clcsock callback functions
  Current release - new code bugs:
   - iwlwifi:
      - fix use-after-free when no FW is present
      - mei: fix the pskb_may_pull check in ipv4
      - mei: retry mapping the shared area
      - mvm: don't feed the hardware RFKILL into iwlmei
  Previous releases - regressions:
   - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
   - tipc: fix wrong publisher node address in link publications
   - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
     assertion
   - bgmac: make idm and nicpm resource optional again
   - atl1c: fix tx timeout after link flap
  Previous releases - always broken:
   - vsock: remove vsock from connected table when connect is
     interrupted by a signal
   - ping: change destination interface checks to match raw sockets
   - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
     semantics (and null-deref) after SO_RESERVE_MEM was added
   - ipv6: make exclusive flowlabel checks per-netns
   - bonding: force carrier update when releasing slave
   - sched: limit TC_ACT_REPEAT loops
   - bridge: multicast: notify switchdev driver whenever MC processing
     gets disabled because of max entries reached
   - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
   - iwlwifi: fix locking when "HW not ready"
   - phy: mediatek: remove PHY mode check on MT7531
   - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
   - dsa: lan9303:
      - fix polarity of reset during probe
      - fix accelerated VLAN handling"
* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  bonding: force carrier update when releasing slave
  nfp: flower: netdev offload check for ip6gretap
  ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
  ipv4: fix data races in fib_alias_hw_flags_set
  net: dsa: lan9303: add VLAN IDs to master device
  net: dsa: lan9303: handle hwaccel VLAN tags
  vsock: remove vsock from connected table when connect is interrupted by a signal
  Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
  ping: fix the dif and sdif check in ping_lookup
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
  net: sched: limit TC_ACT_REPEAT loops
  tipc: fix wrong notification node addresses
  net: dsa: lantiq_gswip: fix use after free in gswip_remove()
  ipv6: per-netns exclusive flowlabel checks
  net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
  CDC-NCM: avoid overflow in sanity checking
  mctp: fix use after free
  net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
  bonding: fix data-races around agg_select_timer
  dpaa2-eth: Initialize mutex used in one step timestamping path
  ...
Zhang Changzhong [Wed, 16 Feb 2022 14:18:08 +0000 (22:18 +0800)]
 
bonding: force carrier update when releasing slave
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0
Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reinette Chatre [Wed, 2 Feb 2022 19:41:12 +0000 (11:41 -0800)]
 
x86/sgx: Fix missing poison handling in reclaimer
The SGX reclaimer code lacks page poison handling in its main
free path. This can lead to avoidable machine checks if a
poisoned page is freed and reallocated instead of being
isolated.
A troublesome scenario is:
 1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
 2. arch_memory_failure() is eventually called
 3. (SGX) page->poison set to 1
 4. Page is reclaimed
 5. Page added to normal free lists by sgx_reclaim_pages()
    ^ This is the bug (poison pages should be isolated on the
    sgx_poison_page_list instead)
 6. Page is reallocated by some innocent enclave, a second (synchronous)
    in-kernel #MC is induced, probably during EADD instruction.
    ^ This is the fallout from the bug
(6) is unfortunate and can be avoided by replacing the open coded
enclave page freeing code in the reclaimer with sgx_free_epc_page()
to obtain support for poison page handling that includes placing the
poisoned page on the correct list.
Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark free pages")
Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and free pages")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/dcc95eb2aaefb042527ac50d0a50738c7c160dac.1643830353.git.reinette.chatre@intel.com
Luis Chamberlain [Tue, 15 Feb 2022 02:08:28 +0000 (18:08 -0800)]
 
fs/file_table: fix adding missing kmemleak_not_leak()
Commit 
b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl
to its own file") fixed a regression, however it failed to add a
kmemleak_not_leak().
Fixes: b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
Reported-by: Tong Zhang <ztong0001@gmail.com>
Cc: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 17 Feb 2022 18:06:09 +0000 (10:06 -0800)]
 
Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
 - Fix corrupt inject files when only last branch option is enabled with
   ARM CoreSight ETM
 - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12
 - Defer freeing string after possible strlen() on it in the BPF loader,
   found by gcc 12
 - Avoid early exit in 'perf trace' due SIGCHLD from non-workload
   processes
 - Fix arm64 perf_event_attr 'perf test's wrt --call-graph
   initialization
 - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf
 - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
   the CPU iterator
 - Sync linux/perf_event.h UAPI with the kernel sources
 - Update Jiri Olsa's email address in MAINTAINERS
* tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf bpf: Defer freeing string after possible strlen() on it
  perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
  libsubcmd: Fix use-after-free for realloc(..., 0)
  libperf: Fix perf_cpu_map__for_each_cpu macro
  perf cs-etm: Fix corrupt inject files when only last branch option is enabled
  perf cs-etm: No-op refactor of synth opt usage
  libperf: Fix 32-bit build for tests uint64_t printf
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  perf trace: Avoid early exit due SIGCHLD from non-workload processes
  MAINTAINERS: Update Jiri's email address
Linus Torvalds [Thu, 17 Feb 2022 17:54:00 +0000 (09:54 -0800)]
 
Merge tag 'modules-5.17-rc5' of git://git./linux/kernel/git/mcgrof/linux
Pull module fix from Luis Chamberlain:
 "Fixes module decompression when CONFIG_SYSFS=n
  The only fix trickled down for v5.17-rc cycle so far is the fix for
  module decompression when CONFIG_SYSFS=n. This was reported through
  0-day"
* tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: fix building with sysfs disabled
Danie du Toit [Thu, 17 Feb 2022 12:48:20 +0000 (14:48 +0200)]
 
nfp: flower: netdev offload check for ip6gretap
IPv6 GRE tunnels are not being offloaded, this is caused by a missing
netdev offload check. The functionality of IPv6 GRE tunnel offloading
was previously added but this check was not included. Adding the
ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.
Fixes: f7536ffb0986 ("nfp: flower: Allow ipv6gretap interface for offloading")
Signed-off-by: Danie du Toit <danie.dutoit@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 16 Feb 2022 17:32:17 +0000 (09:32 -0800)]
 
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().
BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
 fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
 fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
 fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
 fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
 __ip6_del_rt net/ipv6/route.c:3876 [inline]
 ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
 __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
 ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
 __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
 ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
 inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
 __sock_release net/socket.c:650 [inline]
 sock_close+0x6c/0x150 net/socket.c:1318
 __fput+0x295/0x520 fs/file_table.c:280
 ____fput+0x11/0x20 fs/file_table.c:313
 task_work_run+0x8e/0x110 kernel/task_work.c:164
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
 exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
 syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae
write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
 fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
 nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
 nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
 nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
 nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
 nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30
value changed: 0x22 -> 0x2a
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 0c5fcf9e249e ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e44 ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 16 Feb 2022 17:32:16 +0000 (09:32 -0800)]
 
ipv4: fix data races in fib_alias_hw_flags_set
fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.
We need to annotate accesses to following fields of struct fib_alias:
    offload, trap, offload_failed
Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.
BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
 fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30
write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
 fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30
value changed: 0x00 -> 0x02
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 
5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 90b93f1b31f8 ("ipv4: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mans Rullgard [Wed, 16 Feb 2022 20:48:18 +0000 (20:48 +0000)]
 
net: dsa: lan9303: add VLAN IDs to master device
If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received.  Do this in the
port_enable() function, and remove them in port_disable().
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mans Rullgard [Wed, 16 Feb 2022 12:46:34 +0000 (12:46 +0000)]
 
net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
where the VLAN tag has already been consumed by the master.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 17 Feb 2022 16:57:47 +0000 (08:57 -0800)]
 
mm: don't try to NUMA-migrate COW pages that have other uses
Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:
 "All the details are in the bug, but the bottom line is that somehow,
  this patch causes corruption when the numa balancing feature is
  enabled AND we don't use process affinity AND we use GUP to pin pages
  so our accelerator can DMA to/from system memory.
  Either disabling numa balancing, using process affinity to bind to
  specific numa-node or reverting this patch causes the bug to
  disappear"
and Oded bisected the issue to commit 
09854ba94c6a ("mm: do_wp_page()
simplification").
Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW.  But it appears it
does.  Suspicious.
However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.
The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.
Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.
Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information).  Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.
The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/
Reported-and-tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Seth Forshee [Thu, 17 Feb 2022 14:13:12 +0000 (08:13 -0600)]
 
vsock: remove vsock from connected table when connect is interrupted by a signal
vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.
Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.
Note for backporting: this patch requires 
d5afa82c977e ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Gorski [Wed, 16 Feb 2022 18:46:34 +0000 (10:46 -0800)]
 
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
This reverts commit 
3710e80952cf2dc48257ac9f145b117b5f74e0a5.
Since idm_base and nicpm_base are still optional resources not present
on all platforms, this breaks the driver for everything except Northstar
2 (which has both).
The same change was already reverted once with 
755f5738ff98 ("net:
broadcom: fix a mistake about ioremap resource").
So let's do it again.
Fixes: 3710e80952cf ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
[florian: Added comments to explain the resources are optional]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric W. Biederman [Thu, 10 Feb 2022 00:09:41 +0000 (18:09 -0600)]
 
ucounts: Handle wrapping in is_ucounts_overlimit
While examining is_ucounts_overlimit and reading the various messages
I realized that is_ucounts_overlimit fails to deal with counts that
may have wrapped.
Being wrapped should be a transitory state for counts and they should
never be wrapped for long, but it can happen so handle it.
Cc: stable@vger.kernel.org
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Link: https://lkml.kernel.org/r/20220216155832.680775-5-ebiederm@xmission.com
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Mon, 14 Feb 2022 15:40:25 +0000 (09:40 -0600)]
 
ucounts: Move RLIMIT_NPROC handling after set_user
During set*id() which cred->ucounts to charge the the current process
to is not known until after set_cred_ucounts.  So move the
RLIMIT_NPROC checking into a new helper flag_nproc_exceeded and call
flag_nproc_exceeded after set_cred_ucounts.
This is very much an arbitrary subset of the places where we currently
change the RLIMIT_NPROC accounting, designed to preserve the existing
logic.
Fixing the existing logic will be the subject of another series of
changes.
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220216155832.680775-4-ebiederm@xmission.com
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Wed, 9 Feb 2022 22:22:20 +0000 (16:22 -0600)]
 
ucounts: Base set_cred_ucounts changes on the real user
Michal Koutný <mkoutny@suse.com> wrote:
> Tasks are associated to multiple users at once. Historically and as per
> setrlimit(2) RLIMIT_NPROC is enforce based on real user ID.
>
> The commit 
21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> made the accounting structure "indexed" by euid and hence potentially
> account tasks differently.
>
> The effective user ID may be different e.g. for setuid programs but
> those are exec'd into already existing task (i.e. below limit), so
> different accounting is moot.
>
> Some special setresuid(2) users may notice the difference, justifying
> this fix.
I looked at cred->ucount and it is only used for rlimit operations
that were previously stored in cred->user.  Making the fact
cred->ucount can refer to a different user from cred->user a bug,
affecting all uses of cred->ulimit not just RLIMIT_NPROC.
Fix set_cred_ucounts to always use the real uid not the effective uid.
Further simplify set_cred_ucounts by noticing that set_cred_ucounts
somehow retained a draft version of the check to see if alloc_ucounts
was needed that checks the new->user and new->user_ns against the
current_real_cred().  Remove that draft version of the check.
All that matters for setting the cred->ucounts are the user_ns and uid
fields in the cred.
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220207121800.5079-4-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220216155832.680775-3-ebiederm@xmission.com
Reported-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Thu, 10 Feb 2022 02:03:19 +0000 (20:03 -0600)]
 
ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
Michal Koutný <mkoutny@suse.com> wrote:
> It was reported that v5.14 behaves differently when enforcing
> RLIMIT_NPROC limit, namely, it allows one more task than previously.
> This is consequence of the commit 
21d1c5e386bc ("Reimplement
> RLIMIT_NPROC on top of ucounts") that missed the sharpness of
> equality in the forking path.
This can be fixed either by fixing the test or by moving the increment
to be before the test.  Fix it my moving copy_creds which contains
the increment before is_ucounts_overlimit.
In the case of CLONE_NEWUSER the ucounts in the task_cred changes.
The function is_ucounts_overlimit needs to use the final version of
the ucounts for the new process.  Which means moving the
is_ucounts_overlimit test after copy_creds is necessary.
Both the test in fork and the test in set_user were semantically
changed when the code moved to ucounts.  The change of the test in
fork was bad because it was before the increment.  The test in
set_user was wrong and the change to ucounts fixed it.  So this
fix only restores the old behavior in one lcation not two.
Link: https://lkml.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220216155832.680775-2-ebiederm@xmission.com
Cc: stable@vger.kernel.org
Reported-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Fri, 11 Feb 2022 19:57:44 +0000 (13:57 -0600)]
 
rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
Solar Designer <solar@openwall.com> wrote:
> I'm not aware of anyone actually running into this issue and reporting
> it.  The systems that I personally know use suexec along with rlimits
> still run older/distro kernels, so would not yet be affected.
>
> So my mention was based on my understanding of how suexec works, and
> code review.  Specifically, Apache httpd has the setting RLimitNPROC,
> which makes it set RLIMIT_NPROC:
>
> https://httpd.apache.org/docs/2.4/mod/core.html#rlimitnproc
>
> The above documentation for it includes:
>
> "This applies to processes forked from Apache httpd children servicing
> requests, not the Apache httpd children themselves. This includes CGI
> scripts and SSI exec commands, but not any processes forked from the
> Apache httpd parent, such as piped logs."
>
> In code, there are:
>
> ./modules/generators/mod_cgid.c:        ( (cgid_req.limits.limit_nproc_set) && ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/generators/mod_cgi.c:        ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/filters/mod_ext_filter.c:    rv = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, conf->limit_nproc);
>
> For example, in mod_cgi.c this is in run_cgi_child().
>
> I think this means an httpd child sets RLIMIT_NPROC shortly before it
> execs suexec, which is a SUID root program.  suexec then switches to the
> target user and execs the CGI script.
>
> Before 
2863643fb8b9, the setuid() in suexec would set the flag, and the
> target user's process count would be checked against RLIMIT_NPROC on
> execve().  After 
2863643fb8b9, the setuid() in suexec wouldn't set the
> flag because setuid() is (naturally) called when the process is still
> running as root (thus, has those limits bypass capabilities), and
> accordingly execve() would not check the target user's process count
> against RLIMIT_NPROC.
In commit 
2863643fb8b9 ("set_user: add capability check when
rlimit(RLIMIT_NPROC) exceeds") capable calls were added to set_user to
make it more consistent with fork.  Unfortunately because of call site
differences those capable calls were checking the credentials of the
user before set*id() instead of after set*id().
This breaks enforcement of RLIMIT_NPROC for applications that set the
rlimit and then call set*id() while holding a full set of
capabilities.  The capabilities are only changed in the new credential
in security_task_fix_setuid().
The code in apache suexec appears to follow this pattern.
Commit 
909cc4ae86f3 ("[PATCH] Fix two bugs with process limits
(RLIMIT_NPROC)") where this check was added describes the targes of this
capability check as:
  2/ When a root-owned process (e.g. cgiwrap) sets up process limits and then
      calls setuid, the setuid should fail if the user would then be running
      more than rlim_cur[RLIMIT_NPROC] processes, but it doesn't.  This patch
      adds an appropriate test.  With this patch, and per-user process limit
      imposed in cgiwrap really works.
So the original use case of this check also appears to match the broken
pattern.
Restore the enforcement of RLIMIT_NPROC by removing the bad capable
checks added in set_user.  This unfortunately restores the
inconsistent state the code has been in for the last 11 years, but
dealing with the inconsistencies looks like a larger problem.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20210907213042.GA22626@openwall.com/
Link: https://lkml.kernel.org/r/20220212221412.GA29214@openwall.com
Link: https://lkml.kernel.org/r/20220216155832.680775-1-ebiederm@xmission.com
Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reviewed-by: Solar Designer <solar@openwall.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Xin Long [Wed, 16 Feb 2022 05:20:52 +0000 (00:20 -0500)]
 
ping: fix the dif and sdif check in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:
   # sysctl -w net.ipv4.ping_group_range="0 100"
There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:
  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 192.168.111.1/24 dev dummy0
  # ping -I dummy0 192.168.111.1 -c1
The issue was also reported on:
  https://github.com/iputils/iputils/issues/104
But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.
This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Laibin Qiu [Sat, 22 Jan 2022 11:10:45 +0000 (19:10 +0800)]
 
block/wbt: fix negative inflight counter when remove scsi device
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.
The following is the scenario that triggered the problem.
T1                          T2                           T3
                            elevator_switch_mq
                            bfq_init_queue
                            wbt_disable_default <= Set
                            rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
                                                         scsi_remove_device
                                                         sd_remove
                                                         del_gendisk
                                                         blk_unregister_queue
                                                         elv_unregister_queue
                                                         wbt_enable_default
                                                         <= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
Fix this by move wbt_enable_default() from elv_unregister to
bfq_exit_queue(). Only re-enable wbt when bfq exit.
Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
Remove oneline stale comment, and kill one oneshot local variable.
Signed-off-by: Ming Lei <ming.lei@rehdat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 17 Feb 2022 07:52:31 +0000 (08:52 +0100)]
 
block: fix surprise removal for drivers calling blk_set_queue_dying
Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 
8e141f9eb803 that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.
Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.
Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk")
Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Haimin Zhang [Wed, 16 Feb 2022 08:40:38 +0000 (16:40 +0800)]
 
block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize
the buffer of a bio.
Signed-off-by: Haimin Zhang <tcs.kernel@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Daniele Palmas [Tue, 15 Feb 2022 11:13:35 +0000 (12:13 +0100)]
 
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
0x1071 composition in order to avoid bind error.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Wed, 16 Feb 2022 19:01:00 +0000 (16:01 -0300)]
 
perf bpf: Defer freeing string after possible strlen() on it
This was detected by the gcc in Fedora Rawhide's gcc:
  50    11.01 fedora:rawhide                : FAIL gcc version 12.0.1 
20220205 (Red Hat 12.0.1-0) (GCC)
        inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
    util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
     1225 |                 *key_scan_pos += strlen(map_opt);
          |                                  ^~~~~~~~~~~~~~~
    util/bpf-loader.c:1223:9: note: call to 'free' here
     1223 |         free(map_name);
          |         ^~~~~~~~~~~~~~
    cc1: all warnings being treated as errors
So do the calculations on the pointer before freeing it.
Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Dave Airlie [Thu, 17 Feb 2022 09:06:07 +0000 (19:06 +1000)]
 
Merge tag 'amd-drm-fixes-5.17-2022-02-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.17-2022-02-16:
amdgpu:
- Stable pstate clock fixes for Dimgrey Cavefish and Beige Goby
- S0ix SDMA fix
- Yellow Carp GPU reset fix
radeon:
- Backlight fix for iMac 12,1
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220217035242.8084-1-alexander.deucher@amd.com
Takashi Iwai [Tue, 15 Feb 2022 13:27:56 +0000 (14:27 +0100)]
 
ASoC: intel: skylake: Set max DMA segment size
The recent code refactoring to use the standard DMA helper requires
the max DMA segment size setup for SG list management.  Without it,
the kernel may spew warnings when a large buffer is allocated.
This patch sets up dma_set_max_seg_size() for avoiding spurious
warnings.
Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)")
Acked-by: Cezary Rojewski <cezary.rojewski@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
BugLink: https://github.com/thesofproject/linux/issues/3430
Link: https://lore.kernel.org/r/20220215132756.31236-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Tue, 15 Feb 2022 13:27:55 +0000 (14:27 +0100)]
 
ASoC: SOF: hda: Set max DMA segment size
The recent code refactoring to use the standard DMA helper requires
the max	DMA segment size setup for SG list management.	Without	it,
the kernel may spew warnings when a large buffer is allocated.
This patch sets	up dma_set_max_seg_size() for avoiding spurious
warnings.
Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)")
Acked-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
BugLink: https://github.com/thesofproject/linux/issues/3430
Link: https://lore.kernel.org/r/20220215132756.31236-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Tue, 15 Feb 2022 13:27:54 +0000 (14:27 +0100)]
 
ALSA: hda: Set max DMA segment size
The recent code refactoring to use the standard DMA helper requires
the max	DMA segment size setup for SG list management.	Without	it,
the kernel may spew warnings when a large buffer is allocated.
This patch sets	up dma_set_max_seg_size() for avoiding spurious
warnings.
Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)")
Cc: <stable@vger.kernel.org>
BugLink: https://github.com/thesofproject/linux/issues/3430
Link: https://lore.kernel.org/r/20220215132756.31236-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dave Airlie [Thu, 17 Feb 2022 05:00:47 +0000 (15:00 +1000)]
 
Merge tag 'mediatek-drm-fixes-5.17' of https://git./linux/kernel/git/chunkuang.hu/linux into drm-fixes
Mediatek DRM Fixes for Linux 5.17
1. Avoid EPROBE_DEFER loop with external bridge
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1645027727-19554-1-git-send-email-chunkuang.hu@kernel.org
Eric Dumazet [Tue, 15 Feb 2022 23:53:05 +0000 (15:53 -0800)]
 
net: sched: limit TC_ACT_REPEAT loops
We have been living dangerously, at the mercy of malicious users,
abusing TC_ACT_REPEAT, as shown by this syzpot report [1].
Add an arbitrary limit (32) to the number of times an action can
return TC_ACT_REPEAT.
v2: switch the limit to 32 instead of 10.
    Use net_warn_ratelimited() instead of pr_err_once().
[1] (C repro available on demand)
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0
        (t=10502 jiffies g=5305 q=190)
rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu:    Possible timer handling issue on cpu=0 timer-softirq=3527
rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:I stack:29344 pid:   14 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:4986 [inline]
 __schedule+0xab2/0x4db0 kernel/sched/core.c:6295
 schedule+0xd2/0x260 kernel/sched/core.c:6368
 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963
 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 
5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline]
RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline]
RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508
Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84
RSP: 0018:
ffffc9000283f1b0 EFLAGS: 
00000206
RAX: 
0000000000000003 RBX: 
0000000000000000 RCX: 
1ffff1100fc0071e
RDX: 
0000000000000001 RSI: 
0000000000000201 RDI: 
0000000000000000
RBP: 
ffff88807e0038f0 R08: 
0000000000000001 R09: 
ffffffff8ffbf9ff
R10: 
0000000000000001 R11: 
0000000000000001 R12: 
0000000000004c1e
R13: 
ffffed100fc0071e R14: 
0000000000000001 R15: 
ffff8880b9c3aa80
FS:  
00005555562bf300(0000) GS:
ffff8880b9c00000(0000) knlGS:
0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
CR2: 
00007ffdbfef12b8 CR3: 
00000000723c2000 CR4: 
00000000003506f0
DR0: 
0000000000000000 DR1: 
0000000000000000 DR2: 
0000000000000000
DR3: 
0000000000000000 DR6: 
00000000fffe0ff0 DR7: 
0000000000000400
Call Trace:
 <TASK>
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
 queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
 queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
 do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115
 spin_lock_bh include/linux/spinlock.h:354 [inline]
 sch_tree_lock include/net/sch_generic.h:610 [inline]
 sch_tree_lock include/net/sch_generic.h:605 [inline]
 prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211
 prio_init+0x5c/0x80 net/sched/sch_prio.c:244
 qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253
 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7ee98aae99
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007ffdbfef12d8 EFLAGS: 
00000246 ORIG_RAX: 
000000000000002e
RAX: 
ffffffffffffffda RBX: 
00007ffdbfef1300 RCX: 
00007f7ee98aae99
RDX: 
0000000000000000 RSI: 
0000000020000000 RDI: 
0000000000000003
RBP: 
0000000000000000 R08: 
000000000000000d R09: 
000000000000000d
R10: 
000000000000000d R11: 
0000000000000246 R12: 
00007ffdbfef12f0
R13: 
00000000000f4240 R14: 
000000000004ca47 R15: 
00007ffdbfef12e4
 </TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs
NMI backtrace for cpu 1
CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 
5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
 print_cpu_stall kernel/rcu/tree_stall.h:604 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:688 [inline]
 rcu_pending kernel/rcu/tree.c:3919 [inline]
 rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286
Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b
RSP: 0018:
ffffc90002c5eea8 EFLAGS: 
00000246
RAX: 
0000000000000007 RBX: 
ffff88801c625800 RCX: 
0000000000000000
RDX: 
0000000000000000 RSI: 
0000000000000000 RDI: 
0000000000000003
RBP: 
ffff8880137d3100 R08: 
0000000000000000 R09: 
0000000000000000
R10: 
ffffffff874fcd88 R11: 
0000000000000000 R12: 
ffff88801d692dc0
R13: 
ffff8880137d3104 R14: 
0000000000000000 R15: 
ffff88801d692de8
 tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256
 tcf_action_exec net/sched/act_api.c:1049 [inline]
 tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026
 tcf_exts_exec include/net/pkt_cls.h:326 [inline]
 route4_classify+0xef0/0x1400 net/sched/cls_route.c:179
 __tcf_classify net/sched/cls_api.c:1549 [inline]
 tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615
 prio_classify net/sched/sch_prio.c:42 [inline]
 prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75
 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668
 __dev_xmit_skb net/core/dev.c:3756 [inline]
 __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081
 neigh_hh_output include/net/neighbour.h:533 [inline]
 neigh_output include/net/neighbour.h:547 [inline]
 ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228
 __ip_finish_output net/ipv4/ip_output.c:306 [inline]
 __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288
 ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0x196/0x310 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126
 iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82
 geneve_xmit_skb drivers/net/geneve.c:966 [inline]
 geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one net/core/dev.c:3473 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489
 __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116
 neigh_hh_output include/net/neighbour.h:533 [inline]
 neigh_output include/net/neighbour.h:547 [inline]
 ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126
 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
 __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
 dst_output include/net/dst.h:451 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826
 mld_send_cr net/ipv6/mcast.c:2127 [inline]
 mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659
 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
 worker_thread+0x657/0x1110 kernel/workqueue.c:2454
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
----------------
Code disassembly (best guess):
   0:   48 89 eb                mov    %rbp,%rbx
   3:   c6 45 01 01             movb   $0x1,0x1(%rbp)
   7:   41 bc 00 80 00 00       mov    $0x8000,%r12d
   d:   48 c1 e9 03             shr    $0x3,%rcx
  11:   83 e3 07                and    $0x7,%ebx
  14:   41 be 01 00 00 00       mov    $0x1,%r14d
  1a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  21:   fc ff df
  24:   4c 8d 2c 01             lea    (%rcx,%rax,1),%r13
  28:   eb 0c                   jmp    0x36
* 2a:   f3 90                   pause <-- trapping instruction
  2c:   41 83 ec 01             sub    $0x1,%r12d
  30:   0f 84 72 04 00 00       je     0x4a8
  36:   41 0f b6 45 00          movzbl 0x0(%r13),%eax
  3b:   38 d8                   cmp    %bl,%al
  3d:   7f 08                   jg     0x47
  3f:   84                      .byte 0x84
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jon Maloy [Wed, 16 Feb 2022 02:00:09 +0000 (21:00 -0500)]
 
tipc: fix wrong notification node addresses
The previous bug fix had an unfortunate side effect that broke
distribution of binding table entries between nodes. The updated
tipc_sock_addr struct is also used further down in the same
function, and there the old value is still the correct one.
Fixes: 032062f363b4 ("tipc: fix wrong publisher node address in link publications")
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Link: https://lore.kernel.org/r/20220216020009.3404578-1-jmaloy@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexey Khoroshilov [Tue, 15 Feb 2022 10:42:48 +0000 (13:42 +0300)]
 
net: dsa: lantiq_gswip: fix use after free in gswip_remove()
of_node_put(priv->ds->slave_mii_bus->dev.of_node) should be
done before mdiobus_free(priv->ds->slave_mii_bus).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Fixes: 0d120dfb5d67 ("net: dsa: lantiq_gswip: don't use devres for mdiobus")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1644921768-26477-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Tue, 15 Feb 2022 16:00:37 +0000 (11:00 -0500)]
 
ipv6: per-netns exclusive flowlabel checks
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).
Commit 
59c820b2317f ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.
That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:
  This is an optimization. Robust applications still have to revert to
  requesting leases if the fast path fails due to an exclusive lease.
Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.
Changes
  v2
    - wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled
Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist")
Link: https://lore.kernel.org/netdev/MWHPR2201MB1072BCCCFCE779E4094837ACD0329@MWHPR2201MB1072.namprd22.prod.outlook.com/
Reported-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Congyu Liu <liu3101@purdue.edu>
Link: https://lore.kernel.org/r/20220215160037.1976072-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksandr Mazur [Tue, 15 Feb 2022 16:53:03 +0000 (18:53 +0200)]
 
net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
Whenever bridge driver hits the max capacity of MDBs, it disables
the MC processing (by setting corresponding bridge option), but never
notifies switchdev about such change (the notifiers are called only upon
explicit setting of this option, through the registered netlink interface).
This could lead to situation when Software MDB processing gets disabled,
but this event never gets offloaded to the underlying Hardware.
Fix this by adding a notify message in such case.
Fixes: 147c1e9b902c ("switchdev: bridge: Offload multicast disabled")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steve French [Wed, 16 Feb 2022 19:23:53 +0000 (13:23 -0600)]
 
cifs: fix confusing unneeded warning message on smb2.1 and earlier
When mounting with SMB2.1 or earlier, even with nomultichannel, we
log the confusing warning message:
  "CIFS: VFS: multichannel is not supported on this protocol version, use 3.0 or above"
Fix this so that we don't log this unless they really are trying
to mount with multichannel.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215608
Reported-by: Kim Scarborough <kim@scarborough.kim>
Cc: stable@vger.kernel.org # 5.11+
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Dmitry Torokhov [Tue, 15 Feb 2022 21:11:42 +0000 (13:11 -0800)]
 
module: fix building with sysfs disabled
Sysfs support might be disabled so we need to guard the code that
instantiates "compression" attribute with an #ifdef.
Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Trond Myklebust [Tue, 15 Feb 2022 23:05:18 +0000 (18:05 -0500)]
 
NFS: Do not report writeback errors in nfs_getattr()
The result of the writeback, whether it is an ENOSPC or an EIO, or
anything else, does not inhibit the NFS client from reporting the
correct file timestamps.
Fixes: 79566ef018f5 ("NFS: Getattr doesn't require data sync semantics")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Wed, 16 Feb 2022 20:09:22 +0000 (12:09 -0800)]
 
Merge tag 'mmc-v5.17-rc1-2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
 "Fix recovery logic for multi block I/O reads (MMC_READ_MULTIPLE_BLOCK)"
* tag 'mmc-v5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: block: fix read single on recovery logic
Linus Torvalds [Tue, 15 Feb 2022 23:28:00 +0000 (15:28 -0800)]
 
tty: n_tty: do not look ahead for EOL character past the end of the buffer
Daniel Gibson reports that the n_tty code gets line termination wrong in
very specific cases:
 "If you feed a line with exactly 64 chars + terminating newline, and
  directly afterwards (without reading) another line into a pseudo
  terminal, the the first read() on the other side will return the 64
  char line *without* terminating newline, and the next read() will
  return the missing terminating newline AND the complete next line (if
  it fits in the buffer)"
and bisected the behavior to commit 
3b830a9c34d5 ("tty: convert
tty_ldisc_ops 'read()' function to take a kernel pointer").
Now, digging deeper, it turns out that the behavior isn't exactly new:
what changed in commit 
3b830a9c34d5 was that the tty line discipline
.read() function is now passed an intermediate kernel buffer rather than
the final user space buffer.
And that intermediate kernel buffer is 64 bytes in size - thus that
special case with exactly 64 bytes plus terminating newline.
The same problem did exist before, but historically the boundary was not
the 64-byte chunk, but the user-supplied buffer size, which is obviously
generally bigger (and potentially bigger than N_TTY_BUF_SIZE, which
would hide the issue entirely).
The reason is that the n_tty canon_copy_from_read_buf() code would look
ahead for the EOL character one byte further than it would actually
copy.  It would then decide that it had found the terminator, and unmark
it as an EOL character - which in turn explains why the next read
wouldn't then be terminated by it.
Now, the reason it did all this in the first place is related to some
historical and pretty obscure EOF behavior, see commit 
ac8f3bf8832a
("n_tty: Fix poll() after buffer-limited eof push read") and commit
40d5e0905a03 ("n_tty: Fix EOF push handling").
And the reason for the EOL confusion is that we treat EOF as a special
EOL condition, with the EOL character being NUL (aka "__DISABLED_CHAR"
in the kernel sources).
So that EOF look-ahead also affects the normal EOL handling.
This patch just removes the look-ahead that causes problems, because EOL
is much more critical than the historical "EOF in the middle of a line
that coincides with the end of the buffer" handling ever was.
Now, it is possible that we should indeed re-introduce the "look at next
character to see if it's a EOF" behavior, but if so, that should be done
not at the kernel buffer chunk boundary in canon_copy_from_read_buf(),
but at a higher level, when we run out of the user buffer.
In particular, the place to do that would be at the top of
'n_tty_read()', where we check if it's a continuation of a previously
started read, and there is no more buffer space left, we could decide to
just eat the __DISABLED_CHAR at that point.
But that would be a separate patch, because I suspect nobody actually
cares, and I'd like to get a report about it before bothering.
Fixes: 3b830a9c34d5 ("tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer")
Fixes: ac8f3bf8832a ("n_tty: Fix  poll() after buffer-limited eof push read")
Fixes: 40d5e0905a03 ("n_tty: Fix EOF push handling")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215611
Reported-and-tested-by: Daniel Gibson <metalcaedes@gmail.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Woody Suwalski [Wed, 9 Feb 2022 21:05:09 +0000 (16:05 -0500)]
 
ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
Add and ACPI idle power level limit for 32-bit ThinkPad T40.
There is a regression on T40 introduced by commit 
d6b88ce2, starting
with kernel 5.16:
commit 
d6b88ce2eb9d2698eb24451eb92c0a1649b17bb1
Author: Richard Gong <richard.gong@amd.com>
Date:   Wed Sep 22 08:31:16 2021 -0500
  ACPI: processor idle: Allow playing dead in C3 state
The above patch is trying to enter C3 state during init, what is causing
a T40 system freeze. I have not found a similar issue on any other of my
32-bit machines.
The fix is to add another exception to the processor_power_dmi_table[] list.
As a result the dmesg shows as expected:
[2.155398] ACPI: IBM ThinkPad T40 detected - limiting to C2 max_cstate. Override with "processor.max_cstate=9"
[2.155404] ACPI: processor limited to max C-state 2
The fix is trivial and affects only vintage T40 systems.
Fixes: d6b88ce2eb9d ("CPI: processor idle: Allow playing dead in C3 state")
Signed-off-by: Woody Suwalski <wsuwalski@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
[ rjw: New subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
German Gomez [Tue, 25 Jan 2022 10:44:34 +0000 (10:44 +0000)]
 
perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
The struct perf_event_attr is initialised differently in Arm64 when
recording in call-graph fp mode, so update the relevant tests, and add
two extra arm64-only tests.
Before:
  $ perf test 17 -v
  17: Setup struct perf_event_attr
  [...]
  running './tests/attr/test-record-graph-default'
  expected sample_type=295, got 4391
  expected sample_regs_user=0, got 
1073741824
  FAILED './tests/attr/test-record-graph-default' - match failure
  test child finished with -1
  ---- end ----
After:
[...]
  running './tests/attr/test-record-graph-default-aarch64'
  test limitation 'aarch64'
  running './tests/attr/test-record-graph-fp-aarch64'
  test limitation 'aarch64'
  running './tests/attr/test-record-graph-default'
  test limitation '!aarch64'
  excluded architecture list ['aarch64']
  skipped [aarch64] './tests/attr/test-record-graph-default'
  running './tests/attr/test-record-graph-fp'
  test limitation '!aarch64'
  excluded architecture list ['aarch64']
  skipped [aarch64] './tests/attr/test-record-graph-fp'
[...]
Fixes: 7248e308a5758761 ("perf tools: Record ARM64 LR register automatically")
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220125104435.2737-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kees Cook [Sun, 13 Feb 2022 18:24:43 +0000 (10:24 -0800)]
 
libsubcmd: Fix use-after-free for realloc(..., 0)
GCC 12 correctly reports a potential use-after-free condition in the
xrealloc helper. Fix the warning by avoiding an implicit "free(ptr)"
when size == 0:
In file included from help.c:12:
In function 'xrealloc',
    inlined from 'add_cmdname' at help.c:24:2: subcmd-util.h:56:23: error: pointer may be used after 'realloc' [-Werror=use-after-free]
   56 |                 ret = realloc(ptr, size);
      |                       ^~~~~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
   52 |         void *ret = realloc(ptr, size);
      |                     ^~~~~~~~~~~~~~~~~~
subcmd-util.h:58:31: error: pointer may be used after 'realloc' [-Werror=use-after-free]
   58 |                         ret = realloc(ptr, 1);
      |                               ^~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
   52 |         void *ret = realloc(ptr, size);
      |                     ^~~~~~~~~~~~~~~~~~
Fixes: 2f4ce5ec1d447beb ("perf tools: Finalize subcmd independence")
Reported-by: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Kees Kook <keescook@chromium.org>
Tested-by: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-hardening@vger.kernel.org
Cc: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Link: http://lore.kernel.org/lkml/20220213182443.4037039-1-keescook@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 15 Feb 2022 15:37:13 +0000 (16:37 +0100)]
 
libperf: Fix perf_cpu_map__for_each_cpu macro
Tzvetomir Stoyanov reported an issue with using macro
perf_cpu_map__for_each_cpu using private perf_cpu object.
The issue is caused by recent change that wrapped cpu in struct perf_cpu
to distinguish it from cpu indexes. We need to make struct perf_cpu
public.
Add a simple test for using the perf_cpu_map__for_each_cpu macro.
Fixes: 6d18804b963b78dc ("perf cpumap: Give CPUs their own type")
Reported-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220215153713.31395-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Thu, 10 Feb 2022 20:06:20 +0000 (20:06 +0000)]
 
perf cs-etm: Fix corrupt inject files when only last branch option is enabled
'perf inject' with Coresight data generates files that cannot be opened
when only the last branch option is specified:
  perf inject -i perf.data --itrace=l -o inject.data
  perf script -i inject.data
  0x33faa8 [0x8]: failed to process type: 9 [Bad address]
This is because cs_etm__synth_instruction_sample() is called even when
the sample type for instructions hasn't been setup. Last branch records
are attached to instruction samples so it doesn't make sense to generate
them when --itrace=i isn't specified anyway.
This change disables all calls of cs_etm__synth_instruction_sample()
unless --itrace=i is specified, resulting in a file with no samples if
only --itrace=l is provided, rather than a bad file.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220210200620.1227232-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Thu, 10 Feb 2022 20:06:19 +0000 (20:06 +0000)]
 
perf cs-etm: No-op refactor of synth opt usage
sample_branches and sample_instructions are already saved in the
synth_opts struct. Other usages like synth_opts.last_branch don't save a
value, so make this more consistent by always going through synth_opts
and not saving duplicate values.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220210200620.1227232-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rob Herring [Tue, 1 Feb 2022 21:39:03 +0000 (15:39 -0600)]
 
libperf: Fix 32-bit build for tests uint64_t printf
Commit 
a7f3713f6bf207e6 ("libperf tests: Add test_stat_multiplexing test")
added printf's of 64-bit ints using %lu which doesn't work on 32-bit
builds:
  tests/test-evlist.c:529:29: error: format ‘%lu’ expects argument of type \
    ‘long unsigned int’, but argument 4 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
Use PRIu64 instead which works on both 32-bit and 64-bit systems.
Fixes: a7f3713f6bf207e6 ("libperf tests: Add test_stat_multiplexing test")
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Link: https://lore.kernel.org/r/20220201213903.699656-1-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 21 May 2021 19:00:31 +0000 (16:00 -0300)]
 
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
To pick the trivial change in:
  
ddecd22878601a60 ("perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures")
Just adds a comment.
This silences this perf build warning:
  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Changbin Du [Tue, 8 Feb 2022 14:07:25 +0000 (22:07 +0800)]
 
perf trace: Avoid early exit due SIGCHLD from non-workload processes
The function trace__symbols_init() runs "perf-read-vdso32" and that ends up
with a SIGCHLD delivered to 'perf'. And this SIGCHLD make perf exit early.
'perf trace' should exit only if the SIGCHLD is from our workload process.
So let's use sigaction() instead of signal() to match such condition.
Committer notes:
Use memset to zero the 'struct sigaction' variable as the '= { 0 }'
method isn't accepted in many compiler versions, e.g.:
   4    34.02 alpine:3.6                    : FAIL clang version 4.0.0 (tags/RELEASE_400/final)
    builtin-trace.c:4897:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
            struct sigaction sigchld_act = { 0 };
                                             ^
                                             {}
    builtin-trace.c:4897:37: error: missing field 'sa_mask' initializer [-Werror,-Wmissing-field-initializers]
            struct sigaction sigchld_act = { 0 };
                                               ^
    2 errors generated.
   6    32.60 alpine:3.8                    : FAIL gcc version 6.4.0 (Alpine 6.4.0)
    builtin-trace.c:4897:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
            struct sigaction sigchld_act = { 0 };
                                             ^
                                             {}
    builtin-trace.c:4897:37: error: missing field 'sa_mask' initializer [-Werror,-Wmissing-field-initializers]
            struct sigaction sigchld_act = { 0 };
                                               ^
    2 errors generated.
   7    34.82 alpine:3.9                    : FAIL gcc version 8.3.0 (Alpine 8.3.0)
    builtin-trace.c:4897:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
            struct sigaction sigchld_act = { 0 };
                                             ^
                                             {}
    builtin-trace.c:4897:37: error: missing field 'sa_mask' initializer [-Werror,-Wmissing-field-initializers]
            struct sigaction sigchld_act = { 0 };
                                               ^
    2 errors generated.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220208140725.3947-1-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:10 +0000 (16:59 +0100)]
 
sched/isolation: Split housekeeping cpumask per isolation features
To prepare for supporting each housekeeping feature toward cpuset, split
the global housekeeping cpumask per HK_TYPE_* entry.
This will later allow, for example, to runtime modify the cpulist passed
through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot
parameters.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-9-frederic@kernel.org
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:09 +0000 (16:59 +0100)]
 
sched/isolation: Fix housekeeping_mask memory leak
If "nohz_full=" or "isolcpus=nohz" are called with CONFIG_NO_HZ_FULL=n,
housekeeping_mask doesn't get freed despite it being unused if
housekeeping_setup() is called for the first time.
Check this scenario first to fix this, so that no useless allocation
is performed.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-8-frederic@kernel.org
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:08 +0000 (16:59 +0100)]
 
sched/isolation: Consolidate error handling
Centralize the mask freeing and return value for the error path. This
makes potential leaks more visible.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-7-frederic@kernel.org
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:07 +0000 (16:59 +0100)]
 
sched/isolation: Consolidate check for housekeeping minimum service
There can be two subsequent calls to housekeeping_setup() due to
"nohz_full=" and "isolcpus=" that can mix up.  The two passes each have
their own way to deal with an empty housekeeping set of CPUs.
Consolidate this part and remove the awful "tmp" based naming.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-6-frederic@kernel.org
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:06 +0000 (16:59 +0100)]
 
sched/isolation: Use single feature type while referring to housekeeping cpumask
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:05 +0000 (16:59 +0100)]
 
net: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch
To prepare for supporting each feature of the housekeeping cpumask
toward cpuset, prepare each of the HK_FLAG_* entries to move to their
own cpumask with enforcing to fetch them individually. The new
constraint is that multiple HK_FLAG_* entries can't be mixed together
anymore in a single call to housekeeping cpumask().
This will later allow, for example, to runtime modify the cpulist passed
through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot
parameters.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-4-frederic@kernel.org
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:04 +0000 (16:59 +0100)]
 
workqueue: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch
To prepare for supporting each feature of the housekeeping cpumask
toward cpuset, prepare each of the HK_FLAG_* entries to move to their
own cpumask with enforcing to fetch them individually. The new
constraint is that multiple HK_FLAG_* entries can't be mixed together
anymore in a single call to housekeeping cpumask().
This will later allow, for example, to runtime modify the cpulist passed
through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot
parameters.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220207155910.527133-3-frederic@kernel.org
Frederic Weisbecker [Mon, 7 Feb 2022 15:59:03 +0000 (16:59 +0100)]
 
pci: Decouple HK_FLAG_WQ and HK_FLAG_DOMAIN cpumask fetch
To prepare for supporting each feature of the housekeeping cpumask
toward cpuset, prepare each of the HK_FLAG_* entries to move to their
own cpumask with enforcing to fetch them individually. The new
constraint is that multiple HK_FLAG_* entries can't be mixed together
anymore in a single call to housekeeping cpumask().
This will later allow, for example, to runtime modify the cpulist passed
through "isolcpus=", "nohz_full=" and "rcu_nocbs=" kernel boot
parameters.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-2-frederic@kernel.org
Zhaoyang Huang [Tue, 25 Jan 2022 06:56:58 +0000 (14:56 +0800)]
 
psi: fix possible trigger missing in the window
When a new threshold breaching stall happens after a psi event was
generated and within the window duration, the new event is not
generated because the events are rate-limited to one per window. If
after that no new stall is recorded then the event will not be
generated even after rate-limiting duration has passed. This is
happening because with no new stall, window_update will not be called
even though threshold was previously breached. To fix this, record
threshold breaching occurrence and generate the event once window
duration is passed.
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/r/1643093818-19835-1-git-send-email-huangzhaoyang@gmail.com