Will Deacon [Tue, 6 Dec 2022 11:25:43 +0000 (11:25 +0000)]
Merge branch 'for-next/selftests' into for-next/core
* for-next/selftests:
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
kselftest/arm64: Set test names prior to starting children
kselftest/arm64: Use preferred form for predicate load/stores
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: Print ASCII version of unknown signal frame magic values
kselftest/arm64: Remove validation of extra_context from TODO
kselftest/arm64: Provide progress messages when signalling children
kselftest/arm64: Check that all children are producing output in fp-stress
Will Deacon [Tue, 6 Dec 2022 11:22:48 +0000 (11:22 +0000)]
Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (21 commits)
arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
drivers/perf: hisi: Add TLP filter support
Documentation: perf: Indent filter options list of hisi-pcie-pmu
docs: perf: Fix PMU instance name of hisi-pcie-pmu
drivers/perf: hisi: Fix some event id for hisi-pcie-pmu
arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI
perf/amlogic: Remove unused header inclusions of <linux/version.h>
perf/amlogic: Fix build error for x86_64 allmodconfig
dt-binding: perf: Add Amlogic DDR PMU
docs/perf: Add documentation for the Amlogic G12 DDR PMU
perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver
MAINTAINERS: Update HiSilicon PMU maintainers
perf: arm_cspmu: Fix module cyclic dependency
perf: arm_cspmu: Fix build failure on x86_64
perf: arm_cspmu: Fix modular builds due to missing MODULE_LICENSE()s
perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
perf: arm_cspmu: Add support for ARM CoreSight PMU driver
perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init()
perf/arm_dmc620: Fix hotplug callback leak in dmc620_pmu_init()
drivers: perf: marvell_cn10k: Fix hotplug callback leak in tad_pmu_init()
...
Will Deacon [Tue, 6 Dec 2022 11:21:21 +0000 (11:21 +0000)]
Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
arm64: booting: Require placement within 48-bit addressable memory
arm64: mm: kfence: only handle translation faults
arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
Will Deacon [Tue, 6 Dec 2022 11:20:21 +0000 (11:20 +0000)]
Merge branch 'for-next/kprobes' into for-next/core
* for-next/kprobes:
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
arm64: Prohibit instrumentation on arch_stack_walk()
Will Deacon [Tue, 6 Dec 2022 11:16:20 +0000 (11:16 +0000)]
Merge branch 'for-next/kdump' into for-next/core
* for-next/kdump:
arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
arm64: kdump: Provide default size when crashkernel=Y,low is not specified
Will Deacon [Tue, 6 Dec 2022 11:15:50 +0000 (11:15 +0000)]
Merge branch 'for-next/kbuild' into for-next/core
* for-next/kbuild:
arm64: remove special treatment for the link order of head.o
Will Deacon [Tue, 6 Dec 2022 11:14:25 +0000 (11:14 +0000)]
Merge branch 'for-next/insn' into for-next/core
* for-next/insn:
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
arm64: insn: always inline hint generation
arm64: insn: simplify insn group identification
arm64: insn: always inline predicates
arm64: insn: remove aarch64_insn_gen_prefetch()
Will Deacon [Tue, 6 Dec 2022 11:07:39 +0000 (11:07 +0000)]
Merge branch 'for-next/ftrace' into for-next/core
* for-next/ftrace:
ftrace: arm64: remove static ftrace
ftrace: arm64: move from REGS to ARGS
ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
ftrace: pass fregs to arch_ftrace_set_direct_caller()
Will Deacon [Tue, 6 Dec 2022 11:06:47 +0000 (11:06 +0000)]
Merge branch 'for-next/fpsimd' into for-next/core
* for-next/fpsimd:
arm64/fpsimd: Make kernel_neon_ API _GPL
Will Deacon [Tue, 6 Dec 2022 11:06:04 +0000 (11:06 +0000)]
Merge branch 'for-next/ffa' into for-next/core
* for-next/ffa:
firmware: arm_ffa: Move comment before the field it is documenting
firmware: arm_ffa: Move constants to header file
Will Deacon [Tue, 6 Dec 2022 11:04:47 +0000 (11:04 +0000)]
Merge branch 'for-next/errata' into for-next/core
* for-next/errata:
arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
arm64: Add Cortex-715 CPU part definition
Will Deacon [Tue, 6 Dec 2022 11:01:49 +0000 (11:01 +0000)]
Merge branch 'for-next/dynamic-scs' into for-next/core
* for-next/dynamic-scs:
arm64: implement dynamic shadow call stack for Clang
scs: add support for dynamic shadow call stacks
arm64: unwind: add asynchronous unwind tables to kernel and modules
Will Deacon [Tue, 6 Dec 2022 10:44:07 +0000 (10:44 +0000)]
Merge branch 'for-next/cpufeature' into for-next/core
* for-next/cpufeature:
kselftest/arm64: Add SVE 2.1 to hwcap test
arm64/hwcap: Add support for SVE 2.1
kselftest/arm64: Add FEAT_RPRFM to the hwcap test
arm64/hwcap: Add support for FEAT_RPRFM
kselftest/arm64: Add FEAT_CSSC to the hwcap selftest
arm64/hwcap: Add support for FEAT_CSSC
arm64: Enable data independent timing (DIT) in the kernel
Will Deacon [Tue, 6 Dec 2022 10:42:33 +0000 (10:42 +0000)]
Merge branch 'for-next/asm-const' into for-next/core
* for-next/asm-const:
arm64: alternative: constify alternative_has_feature_* argument
arm64: jump_label: mark arguments as const to satisfy asm constraints
Will Deacon [Tue, 6 Dec 2022 10:36:52 +0000 (10:36 +0000)]
Merge branch 'for-next/acpi' into for-next/core
* for-next/acpi:
ACPI: APMT: Fix kerneldoc and indentation
ACPI: Enable FPDT on arm64
arm_pmu: acpi: handle allocation failure
arm_pmu: rework ACPI probing
arm_pmu: factor out PMU matching
arm_pmu: acpi: factor out PMU<->CPU association
ACPI/IORT: Update SMMUv3 DeviceID support
ACPI: ARM Performance Monitoring Unit Table (APMT) initial support
Masami Hiramatsu (Google) [Fri, 2 Dec 2022 02:18:52 +0000 (11:18 +0900)]
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
Return DBG_HOOK_ERROR if kprobes can not handle a BRK because it
fails to find a kprobe corresponding to the address.
Since arm64 kprobes uses stop_machine based text patching for removing
BRK, it ensures all running kprobe_break_handler() is done at that point.
And after removing the BRK, it removes the kprobe from its hash list.
Thus, if the kprobe_break_handler() fails to find kprobe from hash list,
there is a bug.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/166994753273.439920.6629626290560350760.stgit@devnote3
Signed-off-by: Will Deacon <will@kernel.org>
Masami Hiramatsu (Google) [Fri, 2 Dec 2022 02:18:42 +0000 (11:18 +0900)]
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
Since arm64's do_page_fault() can handle the page fault correctly
than kprobe_fault_handler() according to the context, let it handle
the page fault instead of simply call fixup_exception() in the
kprobe_fault_handler().
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/166994752269.439920.4801339965959400456.stgit@devnote3
Signed-off-by: Will Deacon <will@kernel.org>
Masami Hiramatsu (Google) [Fri, 2 Dec 2022 02:18:33 +0000 (11:18 +0900)]
arm64: Prohibit instrumentation on arch_stack_walk()
Mark arch_stack_walk() as noinstr instead of notrace and inline functions
called from arch_stack_walk() as __always_inline so that user does not
put any instrumentations on it, because this function can be used from
return_address() which is used by lockdep.
Without this, if the kernel built with CONFIG_LOCKDEP=y, just probing
arch_stack_walk() via <tracefs>/kprobe_events will crash the kernel on
arm64.
# echo p arch_stack_walk >> ${TRACEFS}/kprobe_events
# echo 1 > ${TRACEFS}/events/kprobes/enable
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
PREEMPT SMP
Modules linked in:
CPU: 0 PID: 17 Comm: migration/0 Tainted: G N 6.1.0-rc5+ #6
Hardware name: linux,dummy-virt (DT)
Stopper: 0x0 <- 0x0
pstate:
600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kprobe_breakpoint_handler+0x178/0x17c
lr : kprobe_breakpoint_handler+0x178/0x17c
sp :
ffff8000080d3090
x29:
ffff8000080d3090 x28:
ffff0df5845798c0 x27:
ffffc4f59057a774
x26:
ffff0df5ffbba770 x25:
ffff0df58f420f18 x24:
ffff49006f641000
x23:
ffffc4f590579768 x22:
ffff0df58f420f18 x21:
ffff8000080d31c0
x20:
ffffc4f590579768 x19:
ffffc4f590579770 x18:
0000000000000006
x17:
5f6b636174735f68 x16:
637261203d207264 x15:
64612e202c30203d
x14:
2074657366666f2e x13:
30633178302f3078 x12:
302b6b6c61775f6b
x11:
636174735f686372 x10:
ffffc4f590dc5bd8 x9 :
ffffc4f58eb31958
x8 :
00000000ffffefff x7 :
ffffc4f590dc5bd8 x6 :
80000000fffff000
x5 :
000000000000bff4 x4 :
0000000000000000 x3 :
0000000000000000
x2 :
0000000000000000 x1 :
ffff0df5845798c0 x0 :
0000000000000064
Call trace:
kprobes: Failed to recover from reentered kprobes.
kprobes: Dump kprobe:
.symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
Fixes: 39ef362d2d45 ("arm64: Make return_address() use arch_stack_walk()")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/166994751368.439920.3236636557520824664.stgit@devnote3
Signed-off-by: Will Deacon <will@kernel.org>
junhua huang [Fri, 2 Dec 2022 07:11:10 +0000 (15:11 +0800)]
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
We use uprobe in aarch64_be, which we found the tracee task would exit
due to SIGILL when we enable the uprobe trace.
We can see the replace inst from uprobe is not correct in aarch big-endian.
As in Armv8-A, instruction fetches are always treated as little-endian,
we should treat the UPROBE_SWBP_INSN as little-endian。
The test case is as following。
bash-4.4# ./mqueue_test_aarchbe 1 1 2 1 10 > /dev/null &
bash-4.4# cd /sys/kernel/debug/tracing/
bash-4.4# echo 'p:test /mqueue_test_aarchbe:0xc30 %x0 %x1' > uprobe_events
bash-4.4# echo 1 > events/uprobes/enable
bash-4.4#
bash-4.4# ps
PID TTY TIME CMD
140 ? 00:00:01 bash
237 ? 00:00:00 ps
[1]+ Illegal instruction ./mqueue_test_aarchbe 1 1 2 1 100 > /dev/null
which we debug use gdb as following:
bash-4.4# gdb attach 155
(gdb) disassemble send
Dump of assembler code for function send:
0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined
0x0000000000400c34 <+4>: mov x29, sp
0x0000000000400c38 <+8>: str w0, [sp, #28]
0x0000000000400c3c <+12>: strb w1, [sp, #27]
0x0000000000400c40 <+16>: str xzr, [sp, #40]
0x0000000000400c44 <+20>: str xzr, [sp, #48]
0x0000000000400c48 <+24>: add x0, sp, #0x1b
0x0000000000400c4c <+28>: mov w3, #0x0 // #0
0x0000000000400c50 <+32>: mov x2, #0x1 // #1
0x0000000000400c54 <+36>: mov x1, x0
0x0000000000400c58 <+40>: ldr w0, [sp, #28]
0x0000000000400c5c <+44>: bl 0x405e10 <mq_send>
0x0000000000400c60 <+48>: str w0, [sp, #60]
0x0000000000400c64 <+52>: ldr w0, [sp, #60]
0x0000000000400c68 <+56>: ldp x29, x30, [sp], #64
0x0000000000400c6c <+60>: ret
End of assembler dump.
(gdb) info b
No breakpoints or watchpoints.
(gdb) c
Continuing.
Program received signal SIGILL, Illegal instruction.
0x0000000000400c30 in send ()
(gdb) x/10x 0x400c30
0x400c30 <send>: 0xd42000a0 0xfd030091 0xe01f00b9 0xe16f0039
0x400c40 <send+16>: 0xff1700f9 0xff1b00f9 0xe06f0091 0x03008052
0x400c50 <send+32>: 0x220080d2 0xe10300aa
(gdb) disassemble 0x400c30
Dump of assembler code for function send:
=> 0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined
0x0000000000400c34 <+4>: mov x29, sp
0x0000000000400c38 <+8>: str w0, [sp, #28]
0x0000000000400c3c <+12>: strb w1, [sp, #27]
0x0000000000400c40 <+16>: str xzr, [sp, #40]
Signed-off-by: junhua huang <huang.junhua@zte.com.cn>
Link: https://lore.kernel.org/r/202212021511106844809@zte.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
Anshuman Khandual [Fri, 2 Dec 2022 01:56:11 +0000 (07:26 +0530)]
arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
__hw_perf_event_init() already calls armpmu->map_event() callback, and also
returns its error code including -ENOENT, along with a debug callout. Hence
an additional armpmu->map_event() check for -ENOENT is redundant.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221202015611.338499-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Tue, 29 Nov 2022 21:59:25 +0000 (21:59 +0000)]
kselftest/arm64: Allow epoll_wait() to return more than one result
When everything is starting up we are likely to have a lot of child
processes producing output at once. This means that we can reduce
overhead a bit by allowing epoll_wait() to return more than one
descriptor at once, it cuts down on the number of system calls we need
to do which on virtual platforms where the syscall overhead is a bit
more noticable and we're likely to have a lot more children active can
make a small but noticable difference.
On physical platforms the relatively small number of processes being run
and vastly improved speeds push the effects of this change into the
noise.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Tue, 29 Nov 2022 21:59:24 +0000 (21:59 +0000)]
kselftest/arm64: Don't drain output while spawning children
Now we hold execution of the stress test programs until all children are
started there is no need to drain output while that is happening.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Tue, 29 Nov 2022 21:59:23 +0000 (21:59 +0000)]
kselftest/arm64: Hold fp-stress children until they're all spawned
At present fp-stress has a bit of a thundering herd problem since the
children it spawns start running immediately, meaning that they can start
starving the parent process of CPU before it has even started all the
children. This is much more severe on virtual platforms since they tend to
support far more SVE and SME vector lengths, be slower in general and for
some have issues with performance when simulating multiple CPUs.
We can mitigate this problem by having all the child processes block before
starting the test program, meaning that we at least have all the child
processes started before we start heavily using CPU. We still have the same
load issues while waiting for the actual stress test programs to start up
and produce output but they're at least all ready to go before that kicks
in, resulting in substantial reductions in overall runtime on some of the
severely affected systems. One test was showing about 20% improvement.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 16 Nov 2022 17:03:25 +0000 (17:03 +0000)]
firmware: arm_ffa: Move comment before the field it is documenting
This is consistent with the other comments in the struct.
Co-developed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-3-qperret@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Wed, 16 Nov 2022 17:03:24 +0000 (17:03 +0000)]
firmware: arm_ffa: Move constants to header file
FF-A function IDs and error codes will be needed in the hypervisor too,
so move to them to the header file where they can be shared. Rename the
version constants with an "FFA_" prefix so that they are less likely
to clash with other code in the tree.
Co-developed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221116170335.2341003-2-qperret@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Yicong Yang [Thu, 17 Nov 2022 08:41:36 +0000 (16:41 +0800)]
drivers/perf: hisi: Add TLP filter support
The PMU support to filter the TLP when counting the bandwidth with below
options:
- only count the TLP headers
- only count the TLP payloads
- count both TLP headers and payloads
In the current driver it's default to count the TLP payloads only, which
will have an implicity side effects that on the traffic only have header
only TLPs, we'll get no data.
Make this user configuration through "len_mode" parameter and make it
default to count both TLP headers and payloads when user not specified.
Also update the documentation for it.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-5-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Bagas Sanjaya [Thu, 17 Nov 2022 08:41:35 +0000 (16:41 +0800)]
Documentation: perf: Indent filter options list of hisi-pcie-pmu
The "Filter options" list have a rather ugly indentation. Also, the first
paragraph after list name is rendered without separator (as continuation
from the name).
Align the list by indenting the list items and add a blank line
separator for each list name.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Yicong Yang [Thu, 17 Nov 2022 08:41:34 +0000 (16:41 +0800)]
docs: perf: Fix PMU instance name of hisi-pcie-pmu
The PMU instance will be called hisi_pcie<sicl>_core<core> rather than
hisi_pcie<sicl>_<core>. Fix this in the documentation.
Fixes: c8602008e247 ("docs: perf: Add description for HiSilicon PCIe PMU driver")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Yicong Yang [Thu, 17 Nov 2022 08:41:33 +0000 (16:41 +0800)]
drivers/perf: hisi: Fix some event id for hisi-pcie-pmu
Some event id of hisi-pcie-pmu is incorrect, fix them.
Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Anshuman Khandual [Mon, 28 Nov 2022 02:54:49 +0000 (08:24 +0530)]
arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI
__armv8pmu_probe_pmu() returns if detected PMU is either not implemented or
implementation defined. Extracted ID_AA64DFR0_EL1_PMUVer value, when PMU is
not implemented is '0' which can be replaced with ID_AA64DFR0_EL1_PMUVer_NI
defined as '0b0000'.
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20221128025449.39085-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Jiapeng Chong [Tue, 29 Nov 2022 03:21:08 +0000 (11:21 +0800)]
perf/amlogic: Remove unused header inclusions of <linux/version.h>
According to the "Abaci Robot":
| ./drivers/perf/amlogic/meson_g12_ddr_pmu.c:15 linux/version.h not needed.
| ./drivers/perf/amlogic/meson_ddr_pmu_core.c: 19 linux/version.h not needed.
So drop the unnecessary #include directives.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3280
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3282
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221129032108.119661-1-jiapeng.chong@linux.alibaba.com
Link: https://lore.kernel.org/r/20221129032108.119661-2-jiapeng.chong@linux.alibaba.com
[will: Squashed patches together, filled out commit message a bit more]
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Thu, 24 Nov 2022 12:07:22 +0000 (12:07 +0000)]
kselftest/arm64: Set test names prior to starting children
Since we now flush output immediately on starting children we should ensure
that the child name is set beforehand so that any output that does get
flushed from the newly created child has the name of the child attached.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221124120722.150988-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Tue, 22 Nov 2022 17:02:49 +0000 (18:02 +0100)]
arm64: booting: Require placement within 48-bit addressable memory
Some configurations (i.e., 64k + LVA/LPA) can tolerate a physical
placement of the kernel image outside of the 48-bit addressable region,
but given that the loader has no way of knowing whether or not the image
in question supports LVA/LPA, it currently has no choice but to place it
below the 48-bit mark.
Once we add support for LPA2, which allows 52-bit physical and virtual
addressing when using 4k or 16k pages, but in way that relies on
increasing the number of paging levels, there will be more variety in
the configurations that may or may not support this.
So redefine bit #3 in the Image header as 'must be placed within 48-bit
addressable memory', as this is the current de facto meaning.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20221122170249.2453853-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Tue, 22 Nov 2022 16:36:24 +0000 (16:36 +0000)]
ftrace: arm64: remove static ftrace
The build test robot pointer out that there's a build failure when:
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=n
... due to some mismatched ifdeffery, some of which checks
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and some of which checks
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, leading to some missing definitions expected
by the core code when CONFIG_DYNAMIC_FTRACE=n and consequently
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=n.
There's really not much point in supporting CONFIG_DYNAMIC_FTRACE=n (AKA
static ftrace). All supported toolchains allow us to implement
DYNAMIC_FTRACE, distributions all prefer DYNAMIC_FTRACE, and both
powerpc and s390 removed support for static ftrace in commits:
0c0c52306f4792a4 ("powerpc: Only support DYNAMIC_FTRACE not static")
5d6a0163494c78ad ("s390/ftrace: enforce DYNAMIC_FTRACE if FUNCTION_TRACER is selected")
... and according to Steven, static ftrace is only supported on x86 to
allow testing that the core code still functions in this configuration.
Given that, let's simplify matters by removing arm64's support for
static ftrace. This avoids the problem originally reported, and leaves
us with less code to maintain.
Fixes: 26299b3f6ba2 ("ftrace: arm64: move from REGS to ARGS")
Link: https://lore.kernel.org/r/202211212249.livTPi3Y-lkp@intel.com
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221122163624.1225912-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Jiucheng Xu [Tue, 22 Nov 2022 08:40:28 +0000 (16:40 +0800)]
perf/amlogic: Fix build error for x86_64 allmodconfig
The driver misses including <linux/io.h>, which causes a compilation
error with x86_64 'allmodconfig':
drivers/perf/amlogic/meson_g12_ddr_pmu.c: In function 'dmc_g12_get_freq_quick':
drivers/perf/amlogic/meson_g12_ddr_pmu.c:135:15: error: implicit declaration of function 'readl' [-Werror=implicit-function-declaration]
135 | val = readl(info->pll_reg);
| ^~~~~
drivers/perf/amlogic/meson_g12_ddr_pmu.c: In function 'dmc_g12_counter_enable':
drivers/perf/amlogic/meson_g12_ddr_pmu.c:204:9: error: implicit declaration of function 'writel' [-Werror=implicit-function-declaration]
204 | writel(clock_count, info->ddr_reg[0] + DMC_MON_G12_TIMER);
| ^~~~~~
Add the missing header to fix the build.
Fixes: 2016e2113d35 ("perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jiucheng Xu <jiucheng.xu@amlogic.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20221122084028.572494-1-jiucheng.xu@amlogic.com
Signed-off-by: Will Deacon <will@kernel.org>
Jiucheng Xu [Mon, 21 Nov 2022 02:16:00 +0000 (10:16 +0800)]
dt-binding: perf: Add Amlogic DDR PMU
Add binding documentation for the Amlogic G12 series DDR
performance monitor unit.
Signed-off-by: Jiucheng Xu <jiucheng.xu@amlogic.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221121021602.3306998-3-jiucheng.xu@amlogic.com
Signed-off-by: Will Deacon <will@kernel.org>
Jiucheng Xu [Mon, 21 Nov 2022 02:15:59 +0000 (10:15 +0800)]
docs/perf: Add documentation for the Amlogic G12 DDR PMU
Add a user guide to show how to use DDR PMU to
monitor DDR bandwidth on Amlogic G12 SoC
Signed-off-by: Jiucheng Xu <jiucheng.xu@amlogic.com>
Reviewed-by: Chris Healy <healych@amazon.com>
Link: https://lore.kernel.org/r/20221121021602.3306998-2-jiucheng.xu@amlogic.com
Signed-off-by: Will Deacon <will@kernel.org>
Jiucheng Xu [Mon, 21 Nov 2022 02:15:58 +0000 (10:15 +0800)]
perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver
Add support for Amlogic Meson G12 Series SOC - DDR bandwidth PMU driver
framework and interfaces. The PMU can not only monitor the total DDR
bandwidth, but also individual IP module bandwidth.
Signed-off-by: Jiucheng Xu <jiucheng.xu@amlogic.com>
Tested-by: Chris Healy <healych@amazon.com>
Link: https://lore.kernel.org/r/20221121021602.3306998-1-jiucheng.xu@amlogic.com
Signed-off-by: Will Deacon <will@kernel.org>
Shaokun Zhang [Fri, 18 Nov 2022 06:54:00 +0000 (14:54 +0800)]
MAINTAINERS: Update HiSilicon PMU maintainers
Now Qi Liu has left HiSilicon and will no longer access to the
necessary hardware and document, remove the mail and thanks for
her's work.
While add the new maintainer Jonathan Cameron, He is skilled with
kernel and enough knowledge of the driver.
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Qi Liu <liuqi6124@gmail.com>
Link: https://lore.kernel.org/r/20221118065400.48836-1-zhangshaokun@hisilicon.com
Signed-off-by: Will Deacon <will@kernel.org>
Anshuman Khandual [Wed, 16 Nov 2022 14:09:15 +0000 (19:39 +0530)]
arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.
Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.
Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116140915.356601-3-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Anshuman Khandual [Wed, 16 Nov 2022 14:09:14 +0000 (19:39 +0530)]
arm64: Add Cortex-715 CPU part definition
Add the CPU Partnumbers for the new Arm designs.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221116140915.356601-2-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Thu, 17 Nov 2022 11:41:30 +0000 (11:41 +0000)]
kselftest/arm64: Use preferred form for predicate load/stores
The preferred form of the str/ldr for predicate registers with an immediate
of zero is to omit the zero, and the clang built in assembler rejects the
zero immediate. Drop the immediate.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221117114130.687261-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Zhen Lei [Wed, 16 Nov 2022 12:10:44 +0000 (20:10 +0800)]
arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
For crashkernel=X without '@offset', select a region within DMA zones
first, and fall back to reserve region above DMA zones. This allows
users to use the same configuration on multiple platforms.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116121044.1690-3-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Zhen Lei [Wed, 16 Nov 2022 12:10:43 +0000 (20:10 +0800)]
arm64: kdump: Provide default size when crashkernel=Y,low is not specified
Try to allocate at least 128 MiB low memory automatically for the case
that crashkernel=,high is explicitly specified, while crashkenrel=,low
is omitted. This allows users to focus more on the high memory
requirements of their business rather than the low memory requirements
of the crash kernel booting.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116121044.1690-2-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Thu, 3 Nov 2022 17:05:20 +0000 (17:05 +0000)]
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Thu, 3 Nov 2022 17:05:19 +0000 (17:05 +0000)]
ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.
This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Thu, 3 Nov 2022 17:05:18 +0000 (17:05 +0000)]
ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*()
helpers. In preparation, this patch renames
ftrace_instruction_pointer_set() to
ftrace_regs_set_instruction_pointer().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Thu, 3 Nov 2022 17:05:17 +0000 (17:05 +0000)]
ftrace: pass fregs to arch_ftrace_set_direct_caller()
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly
necessary in the core ftrace code.
This patch changes the prototype of arch_ftrace_set_direct_caller() to
take ftrace_regs rather than pt_regs, and moves the extraction of the
pt_regs into arch_ftrace_set_direct_caller().
On x86, arch_ftrace_set_direct_caller() can be used even when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines
struct ftrace_regs. Due to this, it's necessary to define
arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete
type. I've also moved the body of arch_ftrace_set_direct_caller() after
the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct
ftrace_regs.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Besar Wicaksono [Wed, 16 Nov 2022 20:39:52 +0000 (14:39 -0600)]
perf: arm_cspmu: Fix module cyclic dependency
Build on arm64 allmodconfig failed with:
| depmod: ERROR: Cycle detected: arm_cspmu -> nvidia_cspmu -> arm_cspmu
| depmod: ERROR: Found 2 modules in dependency cycles!
The arm_cspmu.c provides standard functions to operate the PMU and the
vendor code provides vendor specific attributes. Both need to be built as
single kernel module.
Update the makefile to compile sources under arm_cspmu into one module.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221116203952.34168-1-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Besar Wicaksono [Wed, 16 Nov 2022 19:04:55 +0000 (13:04 -0600)]
perf: arm_cspmu: Fix build failure on x86_64
Building on x86_64 allmodconfig failed:
| drivers/perf/arm_cspmu/arm_cspmu.c:1114:29: error: implicit
| declaration of function 'get_acpi_id_for_cpu'
get_acpi_id_for_cpu is a helper function from ARM64.
Fix by adding ARM64 dependency.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221116190455.55651-1-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Tue, 15 Nov 2022 18:24:03 +0000 (18:24 +0000)]
perf: arm_cspmu: Fix modular builds due to missing MODULE_LICENSE()s
Building an arm64 allmodconfig target results in the following failure
from modpost:
| ERROR: modpost: missing MODULE_LICENSE() in drivers/perf/arm_cspmu/arm_cspmu.o
| ERROR: modpost: missing MODULE_LICENSE() in drivers/perf/arm_cspmu/nvidia_cspmu.o
| make[1]: *** [scripts/Makefile.modpost:126: Module.symvers] Error 1
| make: *** [Makefile:1944: modpost] Error 2
Add the missing MODULE_LICENSE() macros, following the license of the
source files and symbol exports.
Signed-off-by: Will Deacon <will@kernel.org>
Besar Wicaksono [Fri, 11 Nov 2022 22:23:29 +0000 (16:23 -0600)]
perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
Fabric (MCF) PMU attributes for CoreSight PMU implementation in
NVIDIA devices.
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20221111222330.48602-3-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Besar Wicaksono [Fri, 11 Nov 2022 22:23:28 +0000 (16:23 -0600)]
perf: arm_cspmu: Add support for ARM CoreSight PMU driver
Add support for ARM CoreSight PMU driver framework and interfaces.
The driver provides generic implementation to operate uncore PMU based
on ARM CoreSight PMU architecture. The driver also provides interface
to get vendor/implementation specific information, for example event
attributes and formating.
The specification used in this implementation can be found below:
* ACPI Arm Performance Monitoring Unit table:
https://developer.arm.com/documentation/den0117/latest
* ARM Coresight PMU architecture:
https://developer.arm.com/documentation/ihi0091/latest
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20221111222330.48602-2-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Shang XiaoJing [Tue, 15 Nov 2022 11:55:40 +0000 (19:55 +0800)]
perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init()
arm_smmu_pmu_init() won't remove the callback added by
cpuhp_setup_state_multi() when platform_driver_register() failed. Remove
the callback by cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit
26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 7d839b4b9e00 ("perf/smmuv3: Add arm64 smmuv3 pmu driver")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
Link: https://lore.kernel.org/r/20221115115540.6245-3-shangxiaojing@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Shang XiaoJing [Tue, 15 Nov 2022 11:55:39 +0000 (19:55 +0800)]
perf/arm_dmc620: Fix hotplug callback leak in dmc620_pmu_init()
dmc620_pmu_init() won't remove the callback added by
cpuhp_setup_state_multi() when platform_driver_register() failed. Remove
the callback by cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit
26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 53c218da220c ("driver/perf: Add PMU driver for the ARM DMC-620 memory controller")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
Link: https://lore.kernel.org/r/20221115115540.6245-2-shangxiaojing@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Yuan Can [Tue, 15 Nov 2022 07:02:07 +0000 (07:02 +0000)]
drivers: perf: marvell_cn10k: Fix hotplug callback leak in tad_pmu_init()
tad_pmu_init() won't remove the callback added by cpuhp_setup_state_multi()
when platform_driver_register() failed. Remove the callback by
cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit
26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 036a7584bede ("drivers: perf: Add LLC-TAD perf counter support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20221115070207.32634-3-yuancan@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Yuan Can [Tue, 15 Nov 2022 07:02:06 +0000 (07:02 +0000)]
perf: arm_dsu: Fix hotplug callback leak in dsu_pmu_init()
dsu_pmu_init() won't remove the callback added by cpuhp_setup_state_multi()
when platform_driver_register() failed. Remove the callback by
cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit
26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 7520fa99246d ("perf: ARM DynamIQ Shared Unit PMU support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221115070207.32634-2-yuancan@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 10:44:11 +0000 (10:44 +0000)]
arm64: mm: kfence: only handle translation faults
Alexander noted that KFENCE only expects to handle faults from invalid page
table entries (i.e. translation faults), but arm64's fault handling logic will
call kfence_handle_page_fault() for other types of faults, including alignment
faults caused by unaligned atomics. This has the unfortunate property of
causing those other faults to be reported as "KFENCE: use-after-free",
which is misleading and hinders debugging.
Fix this by only forwarding unhandled translation faults to the KFENCE
code, similar to what x86 does already.
Alexander has verified that this passes all the tests in the KFENCE test
suite and avoids bogus reports on misaligned atomics.
Link: https://lore.kernel.org/all/20221102081620.1465154-1-zhongbaisong@huawei.com/
Fixes: 840b23986344 ("arm64, kfence: enable KFENCE for ARM64")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221114104411.2853040-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Besar Wicaksono [Fri, 11 Nov 2022 23:43:23 +0000 (17:43 -0600)]
ACPI: APMT: Fix kerneldoc and indentation
Add missing kerneldoc and fix alignment on one of the arguments of
apmt_add_platform_device function.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20221111234323.16182-1-bwicaksono@nvidia.com
[will: Fixed up additional indentation issue]
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 13:59:28 +0000 (13:59 +0000)]
arm64: insn: always inline hint generation
All users of aarch64_insn_gen_hint() (e.g. aarch64_insn_gen_nop()) pass
a constant argument and generate a constant value. Some of those users
are noinstr code (e.g. for alternatives patching).
For noinstr code it is necessary to either inline these functions or to
ensure the out-of-line versions are noinstr.
Since in all cases these are generating a constant, make them
__always_inline.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 13:59:27 +0000 (13:59 +0000)]
arm64: insn: simplify insn group identification
The only code which needs to check for an entire instruction group is
the aarch64_insn_is_steppable() helper function used by kprobes, which
must not be instrumented, and only needs to check for the "Branch,
exception generation and system instructions" class.
Currently we have an out-of-line helper in insn.c which must be marked
as __kprobes, which indexes a table with some bits extracted from the
instruction. In aarch64_insn_is_steppable() we then need to compare the
result with an expected enum value.
It would be simpler to have a predicate for this, as with the other
aarch64_insn_is_*() helpers, which would be always inlined to prevent
inadvertent instrumentation, and would permit better code generation.
This patch adds a predicate function for this instruction group using
the existing __AARCH64_INSN_FUNCS() helpers, and removes the existing
out-of-line helper. As the only class we currently care about is the
branch+exception+sys class, I have only added helpers for this, and left
the other classes unimplemented for now.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 13:59:26 +0000 (13:59 +0000)]
arm64: insn: always inline predicates
We have a number of aarch64_insn_*() predicates which are used in code
which is not instrumentation safe (e.g. alternatives patching, kprobes).
Some of those are marked with __kprobes, but most are not, and are
implemented out-of-line in insn.c.
This patch moves the predicates to insn.h and marks them with
__always_inline. This is ensures that they will respect the
instrumentation requirements of their caller which they will be inlined
into.
At the same time, I've formatted each of the functions consistently as a
list, to make them easier to read and update in future.
Other than preventing unwanted instrumentation, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Mon, 14 Nov 2022 13:59:25 +0000 (13:59 +0000)]
arm64: insn: remove aarch64_insn_gen_prefetch()
There are no users of aarch64_insn_gen_prefetch(), and which encodes a
PRFM (immediate) with a hard-coded offset of 0.
Remove it for now; we can always restore it with tests if we need it in
future.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Jeremy Linton [Wed, 9 Nov 2022 17:47:20 +0000 (11:47 -0600)]
ACPI: Enable FPDT on arm64
FPDT provides some boot timing records useful for analyzing
parts of the UEFI boot stack. Given the existing code works
on arm64, and allows reading the values without utilizing
/dev/mem it seems like a good idea to turn it on.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20221109174720.203723-1-jeremy.linton@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
wangkailong@jari.cn [Sun, 13 Nov 2022 09:41:10 +0000 (17:41 +0800)]
kselftest/arm64: fix array_size.cocci warning
Fix following coccicheck warning:
tools/testing/selftests/arm64/mte/check_mmap_options.c:64:24-25:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:66:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:135:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:96:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:190:24-25:
WARNING: Use ARRAY_SIZE
Signed-off-by: KaiLong Wang <wangkailong@jari.cn>
Link: https://lore.kernel.org/r/777ce8ba.12e.184705d4211.Coremail.wangkailong@jari.cn
Signed-off-by: Will Deacon <will@kernel.org>
Anshuman Khandual [Mon, 7 Nov 2022 14:17:53 +0000 (19:47 +0530)]
arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
pte_to_phys() assembly definition does multiple bits field transformations
to derive physical address, embedded inside a page table entry. Unlike its
C counter part i.e __pte_to_phys(), pte_to_phys() is not very apparent. It
simplifies these operations via a new macro PTE_ADDR_HIGH_SHIFT indicating
how far the pte encoded higher address bits need to be left shifted. While
here, this also updates __pte_to_phys() and __phys_to_pte_val().
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221107141753.2938621-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Thu, 27 Oct 2022 15:59:08 +0000 (17:59 +0200)]
arm64: implement dynamic shadow call stack for Clang
Implement dynamic shadow call stack support on Clang, by parsing the
unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
instructions, and replacing them with the shadow call stack push and pop
instructions, respectively.
This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them without breaking
anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
manipulations of the return address, replacing them 1:1 with shadow call
stack pushes and pops is guaranteed to result in the desired behavior.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-4-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Thu, 27 Oct 2022 15:59:07 +0000 (17:59 +0200)]
scs: add support for dynamic shadow call stacks
In order to allow arches to use code patching to conditionally emit the
shadow stack pushes and pops, rather than always taking the performance
hit even on CPUs that implement alternatives such as stack pointer
authentication on arm64, add a Kconfig symbol that can be set by the
arch to omit the SCS codegen itself, without otherwise affecting how
support code for SCS and compiler options (for register reservation, for
instance) are emitted.
Also, add a static key and some plumbing to omit the allocation of
shadow call stack for dynamic SCS configurations if SCS is disabled at
runtime.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Thu, 27 Oct 2022 15:59:06 +0000 (17:59 +0200)]
arm64: unwind: add asynchronous unwind tables to kernel and modules
Enable asynchronous unwind table generation for both the core kernel as
well as modules, and emit the resulting .eh_frame sections as init code
so we can use the unwind directives for code patching at boot or module
load time.
This will be used by dynamic shadow call stack support, which will rely
on code patching rather than compiler codegen to emit the shadow call
stack push and pop instructions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 15:25:20 +0000 (16:25 +0100)]
kselftest/arm64: Add SVE 2.1 to hwcap test
Add coverage for FEAT_SVE2p1.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-7-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 15:25:19 +0000 (16:25 +0100)]
arm64/hwcap: Add support for SVE 2.1
FEAT_SVE2p1 introduces a number of new SVE instructions. Since there is no
new architectural state added kernel support is simply a new hwcap which
lets userspace know that the feature is supported.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-6-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 15:25:18 +0000 (16:25 +0100)]
kselftest/arm64: Add FEAT_RPRFM to the hwcap test
Since the newly added instruction is in the HINT space we can't reasonably
test for it actually being present.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 15:25:17 +0000 (16:25 +0100)]
arm64/hwcap: Add support for FEAT_RPRFM
FEAT_RPRFM adds a new range prefetch hint within the existing PRFM space
for range prefetch hinting. Add a new hwcap to allow userspace to discover
support for the new instruction.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 15:25:16 +0000 (16:25 +0100)]
kselftest/arm64: Add FEAT_CSSC to the hwcap selftest
Add FEAT_CSSC to the set of features checked by the hwcap selftest.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 15:25:15 +0000 (16:25 +0100)]
arm64/hwcap: Add support for FEAT_CSSC
FEAT_CSSC adds a number of new instructions usable to optimise common short
sequences of instructions, add a hwcap indicating that the feature is
available and can be used by userspace.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 7 Nov 2022 17:07:47 +0000 (17:07 +0000)]
arm64/fpsimd: Make kernel_neon_ API _GPL
Currently for reasons lost in the mists of time the kernel_neon_ APIs are
EXPORT_SYMBOL() but the general policy for floating point usage is that it
should be GPL only given the non-standard runtime environment that holds
while it is in use and PCS impacts when code is compiled for FP usage.
Given the limited existing deployment of non-GPL modules for arm64 and the
fact that other architectures like x86 already make their equivalent
functions GPL only this is not expected to be disruptive to existing users.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221107170747.276910-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Kang Minchul [Sat, 5 Nov 2022 07:31:43 +0000 (16:31 +0900)]
kselftest/arm64: fix array_size.cocci warning
Use ARRAY_SIZE to fix the following coccicheck warnings:
tools/testing/selftests/arm64/mte/check_buffer_fill.c:341:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:35:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:168:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:72:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:369:25-26:
WARNING: Use ARRAY_SIZE
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Link: https://lore.kernel.org/r/20221105073143.78521-1-tegongkang@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Wed, 2 Nov 2022 14:05:43 +0000 (14:05 +0000)]
kselftest/arm64: Print ASCII version of unknown signal frame magic values
The signal magic values are supposed to be allocated as somewhat meaningful
ASCII so if we encounter a bad magic value print the any alphanumeric
characters we find in it as well as the hex value to aid debuggability.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221102140543.98193-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Thu, 27 Oct 2022 11:03:24 +0000 (12:03 +0100)]
kselftest/arm64: Remove validation of extra_context from TODO
When fixing up support for extra_context in the signal handling tests I
didn't notice that there is a TODO file in the directory which lists this
as a thing to be done. Since it's been done remove it from the list.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221027110324.33802-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 14:45:53 +0000 (15:45 +0100)]
kselftest/arm64: Provide progress messages when signalling children
Especially when the test is configured to run for a longer time it can be
reassuring to users to see that the supervising program is running OK so
provide a message every second when the output timer expires.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221017144553.773176-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Mark Brown [Mon, 17 Oct 2022 14:45:52 +0000 (15:45 +0100)]
kselftest/arm64: Check that all children are producing output in fp-stress
Currently we don't have an explicit check that when it's been a second
since we have seen output produced from the test programs starting up that
means all of them are running and we should start both sending signals and
timing out. This is not reliable, especially on very heavily loaded systems
where the test programs might take longer than a second to run.
We do skip sending signals to children that have not produced output yet
so we won't cause them to exit unexpectedly by sending a signal but this
can create confusion when interpreting output, for example appearing to
show the tests running for less time than expected or appearing to show
missed signal deliveries. Avoid issues by explicitly checking that we have
seen output from all the child processes before we start sending signals
or counting test run time.
This is especially likely on virtual platforms with large numbers of vector
lengths supported since the platforms are slow and there will be a lot of
tasks per CPU.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221017144553.773176-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Ard Biesheuvel [Mon, 7 Nov 2022 17:24:00 +0000 (18:24 +0100)]
arm64: Enable data independent timing (DIT) in the kernel
The ARM architecture revision v8.4 introduces a data independent timing
control (DIT) which can be set at any exception level, and instructs the
CPU to avoid optimizations that may result in a correlation between the
execution time of certain instructions and the value of the data they
operate on.
The DIT bit is part of PSTATE, and is therefore context switched as
usual, given that it becomes part of the saved program state (SPSR) when
taking an exception. We have also defined a hwcap for DIT, and so user
space can discover already whether or nor DIT is available. This means
that, as far as user space is concerned, DIT is wired up and fully
functional.
In the kernel, however, we never bothered with DIT: we disable at it
boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
might run with DIT enabled if user space happened to set it.
Currently, we have no idea whether or not running privileged code with
DIT disabled on a CPU that implements support for it may result in a
side channel that exposes privileged data to unprivileged user space
processes, so let's be cautious and just enable DIT while running in the
kernel if supported by all CPUs.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Adam Langley <agl@google.com>
Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@gmail.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20221107172400.1851434-1-ardb@kernel.org
[will: Removed cpu_has_dit() as per Mark's suggestion on the list]
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Tue, 8 Nov 2022 09:37:25 +0000 (09:37 +0000)]
arm_pmu: acpi: handle allocation failure
One of the failure paths in the arm_pmu ACPI code is missing an early
return, permitting a NULL pointer dereference upon a memory allocation
failure.
Add the missing return.
Fixes: fe40ffdb7656 ("arm_pmu: rework ACPI probing")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221108093725.1239563-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Fri, 30 Sep 2022 11:18:44 +0000 (12:18 +0100)]
arm_pmu: rework ACPI probing
The current ACPI PMU probing logic tries to associate PMUs with CPUs
when the CPU is first brought online, in order to handle late hotplug,
though PMUs are only registered during early boot, and so for late
hotplugged CPUs this can only associate the CPU with an existing PMU.
We tried to be clever and the have the arm_pmu_acpi_cpu_starting()
callback allocate a struct arm_pmu when no matching instance is found,
in order to avoid duplication of logic. However, as above this doesn't
do anything useful for late hotplugged CPUs, and this requires us to
allocate memory in an atomic context, which is especially problematic
for PREEMPT_RT, as reported by Valentin and Pierre.
This patch reworks the probing to detect PMUs for all online CPUs in the
arm_pmu_acpi_probe() function, which is more aligned with how DT probing
works. The arm_pmu_acpi_cpu_starting() callback only tries to associate
CPUs with an existing arm_pmu instance, avoiding the problem of
allocating in atomic context.
Note that as we didn't previously register PMUs for late-hotplugged
CPUs, this change doesn't result in a loss of existing functionality,
though we will now warn when we cannot associate a CPU with a PMU.
This change allows us to pull the hotplug callback registration into the
arm_pmu_acpi_probe() function, as we no longer need the callbacks to be
invoked shortly after probing the boot CPUs, and can register it without
invoking the calls.
For the moment the arm_pmu_acpi_init() initcall remains to register the
SPE PMU, though in future this should probably be moved elsewhere (e.g.
the arm64 ACPI init code), since this doesn't need to be tied to the
regular CPU PMU code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20210810134127.1394269-2-valentin.schneider@arm.com/
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/20220912155105.1443303-1-pierre.gondois@arm.com/
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/r/20220930111844.1522365-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Fri, 30 Sep 2022 11:18:43 +0000 (12:18 +0100)]
arm_pmu: factor out PMU matching
A subsequent patch will rework the ACPI probing of PMUs, and we'll need
to match a CPU with a known cpuid in two separate paths.
Factor out the matching logic into a helper function so that it can be
reused.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/r/20220930111844.1522365-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Fri, 30 Sep 2022 11:18:42 +0000 (12:18 +0100)]
arm_pmu: acpi: factor out PMU<->CPU association
A subsequent patch will rework the ACPI probing of PMUs, and we'll need
to associate a CPU with a PMU in two separate paths.
Factor out the association logic into a helper function so that it can
be reused.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://lore.kernel.org/r/20220930111844.1522365-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Masahiro Yamada [Wed, 12 Oct 2022 23:35:00 +0000 (08:35 +0900)]
arm64: remove special treatment for the link order of head.o
In the previous discussion (see the Link tag), Ard pointed out that
arm/arm64/kernel/head.o does not need any special treatment - the only
piece that must appear right at the start of the binary image is the
image header which is emitted into .head.text.
The linker script does the right thing to do. The build system does
not need to manipulate the link order of head.o.
Link: https://lore.kernel.org/lkml/CAMj1kXH77Ja8bSsq2Qj8Ck9iSZKw=1F8Uy-uAWGVDm4-CG=EuA@mail.gmail.com/
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Link: https://lore.kernel.org/r/20221012233500.156764-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Jisheng Zhang [Thu, 6 Oct 2022 07:55:42 +0000 (15:55 +0800)]
arm64: alternative: constify alternative_has_feature_* argument
Inspired by x86 commit
864b435514b2("x86/jump_label: Mark arguments as
const to satisfy asm constraints"), constify alternative_has_feature_*
argument to satisfy asm constraints. And Steven in [1] also pointed
out that "The "i" constraint needs to be a constant."
Tested with building a simple external kernel module with "O0".
Before the patch, got similar gcc warnings and errors as below:
In file included from <command-line>:
In function ‘alternative_has_feature_likely’,
inlined from ‘system_capabilities_finalized’ at
arch/arm64/include/asm/cpufeature.h:440:9,
inlined from ‘arm64_preempt_schedule_irq’ at
arch/arm64/kernel/entry-common.c:264:6:
include/linux/compiler_types.h:285:33: warning:
‘asm’ operand 0 probably does not match constraints
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
arch/arm64/include/asm/alternative-macros.h:232:9:
note: in expansion of macro ‘asm_volatile_goto’
232 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:285:33: error:
impossible constraint in ‘asm’
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
arch/arm64/include/asm/alternative-macros.h:232:9:
note: in expansion of macro ‘asm_volatile_goto’
232 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
After the patch, the simple external test kernel module is built fine
with "-O0".
[1]https://lore.kernel.org/all/
20210212094059.
5f8d05e8@gandalf.local.home/
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221006075542.2658-3-jszhang@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Jisheng Zhang [Thu, 6 Oct 2022 07:55:41 +0000 (15:55 +0800)]
arm64: jump_label: mark arguments as const to satisfy asm constraints
Inspired by x86 commit
864b435514b2("x86/jump_label: Mark arguments as
const to satisfy asm constraints"), mark arch_static_branch()'s and
arch_static_branch_jump()'s arguments as const to satisfy asm
constraints. And Steven in [1] also pointed out that "The "i"
constraint needs to be a constant."
Tested with building a simple external kernel module with "O0".
[1]https://lore.kernel.org/all/
20210212094059.
5f8d05e8@gandalf.local.home/
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221006075542.2658-2-jszhang@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Robin Murphy [Wed, 28 Sep 2022 19:21:26 +0000 (20:21 +0100)]
ACPI/IORT: Update SMMUv3 DeviceID support
IORT E.e now allows SMMUv3 nodes to describe the DeviceID for MSIs
independently of wired GSIVs, where the previous oddly-restrictive
definition meant that an SMMU without PRI support had to provide a
DeviceID even if it didn't support MSIs either. Support this, with
the usual temporary flag definition while the real one is making
its way through ACPICA.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/4b3e2ead4f392d1a47a7528da119d57918e5d806.1664392886.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Besar Wicaksono [Thu, 29 Sep 2022 00:28:34 +0000 (19:28 -0500)]
ACPI: ARM Performance Monitoring Unit Table (APMT) initial support
ARM Performance Monitoring Unit Table describes the properties of PMU
support in ARM-based system. The APMT table contains a list of nodes,
each represents a PMU in the system that conforms to ARM CoreSight PMU
architecture. The properties of each node include information required
to access the PMU (e.g. MMIO base address, interrupt number) and also
identification. For more detailed information, please refer to the
specification below:
* APMT: https://developer.arm.com/documentation/den0117/latest
* ARM Coresight PMU:
https://developer.arm.com/documentation/ihi0091/latest
The initial support adds the detection of APMT table and generic
infrastructure to create platform devices for ARM CoreSight PMUs.
Similar to IORT the root pointer of APMT is preserved during runtime
and each PMU platform device is given a pointer to the corresponding
APMT node.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20220929002834.32664-1-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Linus Torvalds [Sun, 6 Nov 2022 23:07:11 +0000 (15:07 -0800)]
Linux 6.1-rc4
Linus Torvalds [Sun, 6 Nov 2022 21:09:52 +0000 (13:09 -0800)]
Merge tag 'cxl-fixes-for-6.1-rc4' of git://git./linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"Several fixes for CXL region creation crashes, leaks and failures.
This is mainly fallout from the original implementation of dynamic CXL
region creation (instantiate new physical memory pools) that arrived
in v6.0-rc1.
Given the theme of "failures in the presence of pass-through decoders"
this also includes new regression test infrastructure for that case.
Summary:
- Fix region creation crash with pass-through decoders
- Fix region creation crash when no decoder allocation fails
- Fix region creation crash when scanning regions to enforce the
increasing physical address order constraint that CXL mandates
- Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
1:1 memory-device-to-region associations.
- Fix a memory leak for cxl_region objects when regions with active
targets are deleted
- Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
emulated proximity domains.
- Fix region creation failure for switch attached devices downstream
of a single-port host-bridge
- Fix false positive memory leak of cxl_region objects by recycling
recently used region ids rather than freeing them
- Add regression test infrastructure for a pass-through decoder
configuration
- Fix some mailbox payload handling corner cases"
* tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Recycle region ids
cxl/region: Fix 'distance' calculation with passthrough ports
tools/testing/cxl: Add a single-port host-bridge regression config
tools/testing/cxl: Fix some error exits
cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
cxl/region: Fix cxl_region leak, cleanup targets at region delete
cxl/region: Fix region HPA ordering validation
cxl/pmem: Use size_add() against integer overflow
cxl/region: Fix decoder allocation crash
ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.
cxl/region: Fix null pointer dereference due to pass through decoder commit
cxl/mbox: Add a check on input payload size
Linus Torvalds [Sun, 6 Nov 2022 20:59:12 +0000 (12:59 -0800)]
Merge tag 'hwmon-for-v6.1-rc4' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix two regressions:
- Commit
54cc3dbfc10d ("hwmon: (pmbus) Add regulator supply into
macro") resulted in regulator undercount when disabling regulators.
Revert it.
- The thermal subsystem rework caused the scmi driver to no longer
register with the thermal subsystem because index values no longer
match. To fix the problem, the scmi driver now directly registers
with the thermal subsystem, no longer through the hwmon core"
* tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
Revert "hwmon: (pmbus) Add regulator supply into macro"
hwmon: (scmi) Register explicitly with Thermal Framework
Linus Torvalds [Sun, 6 Nov 2022 20:41:32 +0000 (12:41 -0800)]
Merge tag 'perf_urgent_for_v6.1_rc4' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Add Cooper Lake's stepping to the PEBS guest/host events isolation
fixed microcode revisions checking quirk
- Update Icelake and Sapphire Rapids events constraints
- Use the standard energy unit for Sapphire Rapids in RAPL
- Fix the hw_breakpoint test to fail more graciously on !SMP configs
* tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
perf/x86/intel: Fix pebs event constraints for SPR
perf/x86/intel: Fix pebs event constraints for ICL
perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
perf/hw_breakpoint: test: Skip the test if dependencies unmet
Linus Torvalds [Sun, 6 Nov 2022 20:36:47 +0000 (12:36 -0800)]
Merge tag 'x86_urgent_for_v6.1_rc4' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add new Intel CPU models
- Enforce that TDX guests are successfully loaded only on TDX hardware
where virtualization exception (#VE) delivery on kernel memory is
disabled because handling those in all possible cases is "essentially
impossible"
- Add the proper include to the syscall wrappers so that BTF can see
the real pt_regs definition and not only the forward declaration
* tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add several Intel server CPU model numbers
x86/tdx: Panic on bad configs that #VE on "private" memory access
x86/tdx: Prepare for using "INFO" call for a second purpose
x86/syscall: Include asm/ptrace.h in syscall_wrapper header
Linus Torvalds [Sun, 6 Nov 2022 20:23:10 +0000 (12:23 -0800)]
Merge tag 'kbuild-fixes-v6.1-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Use POSIX-compatible grep options
- Document git-related tips for reproducible builds
- Fix a typo in the modpost rule
- Suppress SIGPIPE error message from gcc-ar and llvm-ar
- Fix segmentation fault in the menuconfig search
* tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix segmentation fault in menuconfig search
kbuild: fix SIGPIPE error message for AR=gcc-ar and AR=llvm-ar
kbuild: fix typo in modpost
Documentation: kbuild: Add description of git for reproducible builds
kbuild: use POSIX-compatible grep option
Linus Torvalds [Sun, 6 Nov 2022 18:46:59 +0000 (10:46 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the pKVM stage-1 walker erronously using the stage-2 accessor
- Correctly convert vcpu->kvm to a hyp pointer when generating an
exception in a nVHE+MTE configuration
- Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them
- Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
- Document the boot requirements for FGT when entering the kernel at
EL1
x86:
- Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
- Make argument order consistent for kvcalloc()
- Userspace API fixes for DEBUGCTL and LBRs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix a typo about the usage of kvcalloc()
KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
arm64: booting: Document our requirements for fine grained traps with SME
KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
KVM: arm64: Fix bad dereference on MTE-enabled systems
KVM: arm64: Use correct accessor to parse stage-1 PTEs
Linus Torvalds [Sun, 6 Nov 2022 18:42:29 +0000 (10:42 -0800)]
Merge tag 'for-linus-6.1-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"One fix for silencing a smatch warning, and a small cleanup patch"
* tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: simplify sysenter and syscall setup
x86/xen: silence smatch warning in pmu_msr_chk_emulated()
Linus Torvalds [Sun, 6 Nov 2022 18:30:29 +0000 (10:30 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of bugs, including some regressions, the most serious of
which was one which would cause online resizes to fail with file
systems with metadata checksums enabled.
Also fix a warning caused by the newly added fortify string checker,
plus some bugs that were found using fuzzed file systems"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix fortify warning in fs/ext4/fast_commit.c:1551
ext4: fix wrong return err in ext4_load_and_init_journal()
ext4: fix warning in 'ext4_da_release_space'
ext4: fix BUG_ON() when directory entry has invalid rec_len
ext4: update the backup superblock's at the end of the online resize