Veronika Molnarova [Thu, 15 Feb 2024 11:02:29 +0000 (12:02 +0100)]
perf testsuite: Add common output checking helpers
As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regex.
Add helpers for verifying that all regexes are found in the output, that
all lines match any pattern from a set and that a certain expression is
not present in the output.
In verbose mode these helpers log mismatches for easier failure
investigation.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-6-mpetlan@redhat.com
Veronika Molnarova [Thu, 15 Feb 2024 11:02:28 +0000 (12:02 +0100)]
perf testsuite: Add test case for perf probe
Add new perf probe test case that acts as an entry element in perf test
list. Runs multiple subtests from directory "base_probe", which will be
added in incomming patches and can be expanded without further editing.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-5-mpetlan@redhat.com
Veronika Molnarova [Thu, 15 Feb 2024 11:02:27 +0000 (12:02 +0100)]
perf testsuite: Add initialization script for shell tests
Initialize reporting and logging functions that unifies formatting
of the test output used for shell tests.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-4-mpetlan@redhat.com
Veronika Molnarova [Thu, 15 Feb 2024 11:02:26 +0000 (12:02 +0100)]
perf testsuite: Add common setting for shell tests
Add settings defining sample commands later shared by shell tests. This
adds the possibility to globally adjust the default values for the whole
testsuite.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-3-mpetlan@redhat.com
Veronika Molnarova [Thu, 15 Feb 2024 11:02:25 +0000 (12:02 +0100)]
perf testsuite: Add common regex patters
Unify perf regexes for checking testing output into a single file
to reduce duplicates and prevent errors when editing.
This will be used in upcomming patches in shell tests.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-2-mpetlan@redhat.com
Adrian Hunter [Wed, 31 Jan 2024 19:24:16 +0000 (21:24 +0200)]
perf test: Enable Symbols test to work with a current module dso
The test needs a struct machine and creates one for the current host,
but a side-effect is that struct machine has set up kernel maps
including module maps.
If the 'Symbols' test --dso option specifies a current kernel module,
it will already be present as a kernel dso, and a map with kmaps needs
to be used otherwise there will be a segfault - see below.
For that case, find the existing map and use that. In that case also,
the dso is split by section into multiple dsos, so test those dsos
also. That in turn, shows up that those dsos have not had overlapping
symbols removed, so the test fails.
Example:
Before:
$ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/arch/x86/kvm/kvm-intel.ko
70: Symbols :
--- start ---
Testing /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko
Segmentation fault (core dumped)
After:
$ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/arch/x86/kvm/kvm-intel.ko
70: Symbols :
--- start ---
Testing /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko
Overlapping symbols:
41d30-41fbb l vmx_init
41d30-41fbb g init_module
---- end ----
Symbols: FAILED!
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240131192416.16387-1-adrian.hunter@intel.com
Leo Yan [Wed, 14 Feb 2024 11:39:47 +0000 (19:39 +0800)]
perf build: Cleanup perf register configuration
The target is to allow the tool to always enable the perf register
feature for native parsing and cross parsing, and current code doesn't
depend on the macro 'HAVE_PERF_REGS_SUPPORT'.
This patch remove the variable 'NO_PERF_REGS' and the defined macro
'HAVE_PERF_REGS_SUPPORT' from the Makefile.
Signed-off-by: Leo Yan <leo.yan@linux.dev>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214113947.240957-5-leo.yan@linux.dev
Leo Yan [Wed, 14 Feb 2024 11:39:46 +0000 (19:39 +0800)]
perf parse-regs: Introduce a weak function arch__sample_reg_masks()
Every architecture can provide a register list for sampling. If an
architecture doesn't support register sampling, it won't define the data
structure 'sample_reg_masks'. Consequently, any code using this
structure must be protected by the macro 'HAVE_PERF_REGS_SUPPORT'.
This patch defines a weak function, arch__sample_reg_masks(), which will
be replaced by an architecture-defined function for returning the
architecture's register list. With this refactoring, the function always
exists, the condition checking for 'HAVE_PERF_REGS_SUPPORT' is not
needed anymore, so remove it.
Signed-off-by: Leo Yan <leo.yan@linux.dev>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214113947.240957-4-leo.yan@linux.dev
Leo Yan [Wed, 14 Feb 2024 11:39:45 +0000 (19:39 +0800)]
perf parse-regs: Always build perf register functions
Currently, the macro HAVE_PERF_REGS_SUPPORT is used as a switch to turn
on or turn off the code of perf registers. If any architecture cannot
support perf register, it disables the perf register parsing, for both
the native parsing and cross parsing for other architectures.
To support both the native parsing and cross parsing, the tool should
always build the perf regs functions. Thus, this patch removes
HAVE_PERF_REGS_SUPPORT from the perf regs files.
Signed-off-by: Leo Yan <leo.yan@linux.dev>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214113947.240957-3-leo.yan@linux.dev
Leo Yan [Wed, 14 Feb 2024 11:39:44 +0000 (19:39 +0800)]
perf build: Remove unused CONFIG_PERF_REGS
CONFIG_PERF_REGS is not used, remove it.
Signed-off-by: Leo Yan <leo.yan@linux.dev>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214113947.240957-2-leo.yan@linux.dev
Ian Rogers [Fri, 9 Feb 2024 20:49:47 +0000 (12:49 -0800)]
perf metric: Don't remove scale from counts
Counts were switched from the scaled saved value form to the
aggregated count to avoid double accounting. When this happened the
removing of scaling for a count should have been removed, however, it
wasn't and this wasn't observed as it normally doesn't matter because
a counter's scale is 1. A problem was observed with RAPL events that
are scaled.
Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather than saved_value")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240209204947.3873294-5-irogers@google.com
Ian Rogers [Fri, 9 Feb 2024 20:49:46 +0000 (12:49 -0800)]
perf stat: Avoid metric-only segv
Cycles is recognized as part of a hard coded metric in stat-shadow.c,
it may call print_metric_only with a NULL fmt string leading to a
segfault. Handle the NULL fmt explicitly.
Fixes: 088519f318be ("perf stat: Move the display functions to stat-display.c")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240209204947.3873294-4-irogers@google.com
Ian Rogers [Fri, 9 Feb 2024 20:49:45 +0000 (12:49 -0800)]
perf expr: Fix "has_event" function for metric style events
Events in metrics cannot use '/' as a separator, it would be
recognized as a divide, so they use '@'. The '@' is recognized in the
metricgroups code and changed to '/', do the same in the has_event
function so that the parsing is only tried without the @s.
Fixes: 4a4a9bf9075f ("perf expr: Add has_event function")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240209204947.3873294-3-irogers@google.com
Ian Rogers [Fri, 9 Feb 2024 20:49:44 +0000 (12:49 -0800)]
perf expr: Allow NaN to be a valid number
Currently only floating point numbers can be parsed, add a special
case for NaN.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240209204947.3873294-2-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:46 +0000 (19:17 -0800)]
perf maps: Locking tidy up of nr_maps
After this change maps__nr_maps is only used by tests, existing users
are migrated to maps__empty. Compute maps__empty under the read lock.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240210031746.4057262-7-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:45 +0000 (19:17 -0800)]
perf maps: Hide maps internals
Move the struct into the C file. Add maps__equal to work around
exposing the struct for reference count checking. Add accessors for
the unwind_libunwind_ops. Move maps_list_node to its only use in
symbol.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240210031746.4057262-6-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:44 +0000 (19:17 -0800)]
perf maps: Get map before returning in maps__find_next_entry
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240210031746.4057262-5-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:43 +0000 (19:17 -0800)]
perf maps: Get map before returning in maps__find_by_name
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this. Also fix some reference counted pointer comparisons.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240210031746.4057262-4-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:42 +0000 (19:17 -0800)]
perf maps: Get map before returning in maps__find
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240210031746.4057262-3-irogers@google.com
Ian Rogers [Sat, 10 Feb 2024 03:17:41 +0000 (19:17 -0800)]
perf maps: Switch from rbtree to lazily sorted array for addresses
Maps is a collection of maps primarily sorted by the starting address
of the map. Prior to this change the maps were held in an rbtree
requiring 4 pointers per node. Prior to reference count checking, the
rbnode was embedded in the map so 3 pointers per node were
necessary. This change switches the rbtree to an array lazily sorted
by address, much as the array sorting nodes by name. 1 pointer is
needed per node, but to avoid excessive resizing the backing array may
be twice the number of used elements. Meaning the memory overhead is
roughly half that of the rbtree. For a perf record with
"--no-bpf-event -g -a" of true, the memory overhead of perf inject is
reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.
Map inserts always happen at the end of the array. The code tracks
whether the insertion violates the sorting property. O(log n) rb-tree
complexity is switched to O(1).
Remove slides the array, so O(log n) rb-tree complexity is degraded to
O(n).
A find may need to sort the array using qsort which is O(n*log n), but
in general the maps should be sorted and so average performance should
be O(log n) as with the rbtree.
An rbtree node consumes a cache line, but with the array 4 nodes fit
on a cache line. Iteration is simplified to scanning an array rather
than pointer chasing.
Overall it is expected the performance after the change should be
comparable to before, but with half of the memory consumed.
To avoid a list and repeated logic around splitting maps,
maps__merge_in is rewritten in terms of
maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
inserting remaining gaps. maps__fixup_overlap_and_insert splits the
existing mappings, then adds the incoming mapping. By adding the new
mapping first, then re-inserting the existing mappings the splitting
behavior matches.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240210031746.4057262-2-irogers@google.com
Namhyung Kim [Mon, 12 Feb 2024 20:19:21 +0000 (12:19 -0800)]
Merge branch 'perf-tools' into perf-tools-next
To get some fixes in the perf test and JSON metrics into the development
branch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Thu, 1 Feb 2024 00:15:02 +0000 (16:15 -0800)]
perf srcline: Add missed addr2line closes
The child_process for addr2line sets in and out to -1 so that pipes
get created. It is the caller's responsibility to close the pipes,
finish_command doesn't do it. Add the missed closes.
Fixes: b3801e791231 ("perf srcline: Simplify addr2line subprocess")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240201001504.1348511-8-irogers@google.com
Yicong Yang [Thu, 8 Feb 2024 02:40:26 +0000 (10:40 +0800)]
perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].
perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention.
This patch add support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:
[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
Performance counter stats for 'system wide':
S56-D0-CLS158 4 1,321,521,570 LLC-load
S56-D0-CLS594 4 794,211,453 LLC-load
S56-D0-CLS1030 4 41,623 LLC-load
S56-D0-CLS1466 4 41,646 LLC-load
S56-D0-CLS1902 4 16,863 LLC-load
S56-D0-CLS2338 4 15,721 LLC-load
S56-D0-CLS2774 4 22,671 LLC-load
[...]
On a legacy system without cluster or cluster support, the output will
be look like:
[root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Performance counter stats for 'system wide':
S56-D0-CLS0 64 18,011,485 cycles
S7182-D0-CLS0 64 16,548,835 cycles
Note that this patch doesn't mix the cluster information in the outputs
of --per-core to avoid breaking any tools/scripts using it.
Note that perf recently supports "--per-cache" aggregation, but it's not
the same with the cluster although cluster CPUs may share some cache
resources. For example on my machine all clusters within a die share the
same L3 cache:
$ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-31
$ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3
[1] commit
c5e22feffdd7 ("topology: Represent clusters of CPUs within a die")
Tested-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: james.clark@arm.com
Cc: 21cnbao@gmail.com
Cc: prime.zeng@hisilicon.com
Cc: Jonathan.Cameron@huawei.com
Cc: fanghao11@huawei.com
Cc: linuxarm@huawei.com
Cc: tim.c.chen@intel.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
Namhyung Kim [Thu, 8 Feb 2024 18:10:25 +0000 (10:10 -0800)]
perf tools: Remove misleading comments on map functions
When it converts sample IP to or from objdump-capable one, there's a
comment saying that kernel modules have DSO_SPACE__USER. But commit
02213cec64bb ("perf maps: Mark module DSOs with kernel type") changed
it and makes the comment confusing. Let's get rid of it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240208181025.1329645-1-namhyung@kernel.org
Yang Jihong [Tue, 6 Feb 2024 08:32:28 +0000 (08:32 +0000)]
perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()
slist needs to be freed in both error path and normal path in
thread_map__new_by_tid_str().
Fixes: b52956c961be3a04 ("perf tools: Allow multiple threads or processes in record, stat, top")
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240206083228.172607-6-yangjihong1@huawei.com
Yang Jihong [Tue, 6 Feb 2024 08:32:27 +0000 (08:32 +0000)]
perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}()
The curr_pid and cpu_last_switched are used only for the
'perf sched replay/latency/map'. Put their initialization in
perf_sched__{lat|map|replay () to reduce unnecessary actions in other
commands.
Simple functional testing:
# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.209 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]
# perf sched lat
-------------------------------------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end |
-------------------------------------------------------------------------------------------------------------------------------------------
sched-messaging:(401) | 2990.699 ms | 38705 | avg: 0.661 ms | max: 67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
qemu-system-x86:(7) | 179.764 ms | 2191 | avg: 0.152 ms | max: 21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
sshd:48125 | 0.522 ms | 2 | avg: 0.037 ms | max: 0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
<SNIP>
ksoftirqd/11:82 | 0.063 ms | 1 | avg: 0.005 ms | max: 0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
kworker/9:0-mm_:34624 | 0.233 ms | 20 | avg: 0.004 ms | max: 0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
migration/13:93 | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
-----------------------------------------------------------------------------------------------------------------
TOTAL: | 3180.750 ms | 41368 |
---------------------------------------------------
# echo $?
0
# perf sched map
*A0 456532.510141 secs A0 => migration/0:15
*. 456532.510171 secs . => swapper:0
. *B0 456532.510261 secs B0 => migration/1:21
. *. 456532.510279 secs
<SNIP>
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . . 456532.785979 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . 456532.786054 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . 456532.786127 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . 456532.786197 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 456532.786270 secs
# echo $?
0
# perf sched replay
run measurement overhead: 108 nsecs
sleep measurement overhead: 66473 nsecs
the run test took
1000002 nsecs
the sleep test took
1082686 nsecs
nr_run_events: 49334
nr_sleep_events: 50054
nr_wakeup_events: 34701
target-less wakeups: 165
multi-target wakeups: 766
task 0 ( swapper: 0), nr_events: 15419
task 1 ( swapper: 1), nr_events: 1
task 2 ( swapper: 2), nr_events: 1
<SNIP>
task 715 ( sched-messaging: 110248), nr_events: 1438
task 716 ( sched-messaging: 110249), nr_events: 512
task 717 ( sched-messaging: 110250), nr_events: 500
task 718 ( sched-messaging: 110251), nr_events: 537
task 719 ( sched-messaging: 110252), nr_events: 823
------------------------------------------------------------
#1 : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35
#2 : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56
#3 : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39
#4 : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86
#5 : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34
#6 : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39
#7 : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09
#8 : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57
#9 : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90
#10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07
# echo $?
0
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240206083228.172607-5-yangjihong1@huawei.com
Yang Jihong [Tue, 6 Feb 2024 08:32:26 +0000 (08:32 +0000)]
perf sched: Move curr_thread initialization to perf_sched__map()
The curr_thread is used only for the 'perf sched map'. Put initialization
in perf_sched__map() to reduce unnecessary actions in other commands.
Simple functional testing:
# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.197 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 15.526 MB perf.data (140095 samples) ]
# perf sched map
*A0 451264.532445 secs A0 => migration/0:15
*. 451264.532468 secs . => swapper:0
. *B0 451264.532537 secs B0 => migration/1:21
. *. 451264.532560 secs
. . *C0 451264.532644 secs C0 => migration/2:27
. . *. 451264.532668 secs
. . . *D0 451264.532753 secs D0 => migration/3:33
. . . *. 451264.532778 secs
. . . . *E0 451264.532861 secs E0 => migration/4:39
. . . . *. 451264.532886 secs
. . . . . *F0 451264.532973 secs F0 => migration/5:45
<SNIP>
A7 A7 A7 A7 A7 *A7 . . . . . . . . . . 451264.790785 secs
A7 A7 A7 A7 A7 A7 *A7 . . . . . . . . . 451264.790858 secs
A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . . . . 451264.790934 secs
A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . . . 451264.791004 secs
A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . . 451264.791075 secs
A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . . 451264.791143 secs
A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . . 451264.791232 secs
A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . . 451264.791336 secs
A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . . 451264.791407 secs
A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 . 451264.791484 secs
A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 A7 *A7 451264.791553 secs
# echo $?
0
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240206083228.172607-4-yangjihong1@huawei.com
Yang Jihong [Tue, 6 Feb 2024 08:32:25 +0000 (08:32 +0000)]
perf sched: Fix memory leak in perf_sched__map()
perf_sched__map() needs to free memory of map_cpus, color_pids and
color_cpus in normal path and rollback allocated memory in error path.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240206083228.172607-3-yangjihong1@huawei.com
Yang Jihong [Tue, 6 Feb 2024 08:32:24 +0000 (08:32 +0000)]
perf sched: Move start_work_mutex and work_done_wait_mutex initialization to perf_sched__replay()
The start_work_mutex and work_done_wait_mutex are used only for the
'perf sched replay'. Put their initialization in perf_sched__replay () to
reduce unnecessary actions in other commands.
Simple functional testing:
# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.197 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ]
# perf sched replay
run measurement overhead: 108 nsecs
sleep measurement overhead: 65658 nsecs
the run test took 999991 nsecs
the sleep test took
1079324 nsecs
nr_run_events: 42378
nr_sleep_events: 43102
nr_wakeup_events: 31852
target-less wakeups: 17
multi-target wakeups: 712
task 0 ( swapper: 0), nr_events: 10451
task 1 ( swapper: 1), nr_events: 3
task 2 ( swapper: 2), nr_events: 1
<SNIP>
task 717 ( sched-messaging: 74483), nr_events: 152
task 718 ( sched-messaging: 74484), nr_events: 1944
task 719 ( sched-messaging: 74485), nr_events: 73
task 720 ( sched-messaging: 74486), nr_events: 163
task 721 ( sched-messaging: 74487), nr_events: 942
task 722 ( sched-messaging: 74488), nr_events: 78
task 723 ( sched-messaging: 74489), nr_events: 1090
------------------------------------------------------------
#1 : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70
#2 : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82
#3 : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96
#4 : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64
#5 : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11
#6 : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88
#7 : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01
#8 : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40
#9 : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15
#10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69
# echo $?
0
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240206083228.172607-2-yangjihong1@huawei.com
Yicong Yang [Wed, 7 Feb 2024 09:12:22 +0000 (17:12 +0800)]
perf test: Skip metric w/o event name on arm64 in stat STD output linter
stat+std_output.sh test fails on my arm64 machine:
[root@localhost shell]# ./stat+std_output.sh
Checking STD output: no args Unknown event name in TopDownL1 # 0.18 retiring
[root@localhost shell]# ./stat+std_output.sh
Checking STD output: no args [Success]
Checking STD output: system wide [Success]
Checking STD output: interval [Success]
Checking STD output: per thread Unknown event name in tmux: server-
1114960 # 0.41 frontend_bound
When no args specified `perf stat` will add TopdownL1 metric group
and the output will be like:
[root@localhost shell]# perf stat -- stress-ng --vm 1 --timeout 1
stress-ng: info: [
3351733] setting to a 1 second run per stressor
stress-ng: info: [
3351733] dispatching hogs: 1 vm
stress-ng: info: [
3351733] successful run completed in 1.02s
Performance counter stats for 'stress-ng --vm 1 --timeout 1':
1,037.71 msec task-clock # 1.000 CPUs utilized
13 context-switches # 12.528 /sec
1 cpu-migrations # 0.964 /sec
67,544 page-faults # 65.090 K/sec
2,691,932,561 cycles # 2.594 GHz (74.56%)
6,571,333,653 instructions # 2.44 insn per cycle (74.92%)
521,863,142 branches # 502.901 M/sec (75.21%)
425,879 branch-misses # 0.08% of all branches (87.57%)
TopDownL1 # 0.61 retiring (87.67%)
# 0.03 frontend_bound (87.67%)
# 0.02 bad_speculation (87.67%)
# 0.34 backend_bound (74.61%)
1.
038138390 seconds time elapsed
0.
844849000 seconds user
0.
189053000 seconds sys
Metrics in group TopDownL1 don't have event name on arm64 but are not
listed in the $skip_metric list which they should be listed. Add them
to the skip list as what does for x86 platforms in [1].
[1] commit
4d60e83dfcee ("perf test: Skip metrics w/o event name in stat STD output linter")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: linuxarm@huawei.com
Cc: kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240207091222.54096-1-yangyicong@huawei.com
Adrian Hunter [Thu, 8 Feb 2024 08:53:26 +0000 (10:53 +0200)]
perf symbols: Slightly improve module file executable section mappings
Currently perf does not record module section addresses except for
the .text section. In general that means perf cannot get module section
mappings correct (except for .text) when loading symbols from a kernel
module file. (Note using --kcore does not have this issue)
Improve that situation slightly by identifying executable sections that
use the same mapping as the .text section. That happens when an
executable section comes directly after the .text section, both in memory
and on file, something that can be determined by following the same layout
rules used by the kernel, refer kernel layout_sections(). Note whether
that happens is somewhat arbitrary, so this is not a final solution.
Example from tracing a virtual machine process:
Before:
$ perf script | grep unknown
CPU 0/KVM 1718 203.511270: 318341 cpu-cycles:P:
ffffffffc13e8a70 [unknown] (/lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko)
$ perf script -vvv 2>&1 >/dev/null | grep kvm.intel | grep 'noinstr.text\|ffff'
Map: 0-7e0 41430 [kvm_intel].noinstr.text
Map:
ffffffffc13a7000-
ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko
After:
$ perf script | grep 203.511270
CPU 0/KVM 1718 203.511270: 318341 cpu-cycles:P:
ffffffffc13e8a70 vmx_vmexit+0x0 (/lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko)
$ perf script -vvv 2>&1 >/dev/null | grep kvm.intel | grep 'noinstr.text\|ffff'
Map:
ffffffffc13a7000-
ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko
Reported-by: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240208085326.13432-3-adrian.hunter@intel.com
Adrian Hunter [Thu, 8 Feb 2024 08:53:25 +0000 (10:53 +0200)]
perf tools: Make it possible to see perf's kernel and module memory mappings
Dump kmaps if using 'perf --debug kmaps' or verbose > 2 (e.g. -vvv) for
tools 'perf script' and 'perf report' if there is no browser.
Example:
$ perf --debug kmaps script 2>&1 >/dev/null | grep kvm.intel
build id event received for /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko:
0691d75e10e72ebbbd45a44c59f6d00a5604badf [20]
Map: 0-3a3 4f5d8 [kvm_intel].modinfo
Map: 0-5240 5f280 [kvm_intel]__versions
Map: 0-30 64 [kvm_intel].note.Linux
Map: 0-14 644c0 [kvm_intel].orc_header
Map: 0-5297 43680 [kvm_intel].rodata
Map: 0-5bee 3b837 [kvm_intel].text.unlikely
Map: 0-7e0 41430 [kvm_intel].noinstr.text
Map: 0-2080 713c0 [kvm_intel].bss
Map: 0-26 705c8 [kvm_intel].data..read_mostly
Map: 0-5888 6a4c0 [kvm_intel].data
Map: 0-22 70220 [kvm_intel].data.once
Map: 0-40 705f0 [kvm_intel].data..percpu
Map: 0-1685 41d20 [kvm_intel].init.text
Map: 0-4b8 6fd60 [kvm_intel].init.data
Map: 0-380 70248 [kvm_intel]__dyndbg
Map: 0-8 70218 [kvm_intel].exit.data
Map: 0-438 4f980 [kvm_intel]__param
Map: 0-5f5 4ca0f [kvm_intel].rodata.str1.1
Map: 0-3657 493b8 [kvm_intel].rodata.str1.8
Map: 0-e0 70640 [kvm_intel].data..ro_after_init
Map: 0-500 70ec0 [kvm_intel].gnu.linkonce.this_module
Map:
ffffffffc13a7000-
ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko
The example above shows how the module section mappings are all wrong
except for the main .text mapping at 0xffffffffc13a7000.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240208085326.13432-2-adrian.hunter@intel.com
Namhyung Kim [Fri, 12 Jan 2024 23:13:40 +0000 (15:13 -0800)]
perf record: Display data size on pipe mode
Currently pipe mode doesn't set the file size and it results in a
misleading message of 0 data size at the end. Although it might miss
some accounting for pipe header or more, just displaying the data size
would reduce the possible confusion.
Before:
$ perf record -o- perf test -w noploop | perf report -i- -q --percent-limit=1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ] <====== (here)
99.58% perf perf [.] noploop
After:
$ perf record -o- perf test -w noploop | perf report -i- -q --percent-limit=1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.229 MB - ]
99.46% perf perf [.] noploop
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240112231340.779469-1-namhyung@kernel.org
Kan Liang [Mon, 5 Feb 2024 14:58:19 +0000 (06:58 -0800)]
perf script: Print source line for each jump in brstackinsn
With the srcline option, the perf script only prints a source line at
the beginning of a sample with call/ret from functions, but not for
each jump in brstackinsn. It's useful to print a source line for each
jump in brstackinsn when the end user analyze the full assembler
sequences of branch sequences for the sample.
The srccode option can also be used to locate the source code line.
However, it's printed almost for every line and makes the output less
readable.
$perf script -F +brstackinsn,+srcline --xed
Before the patch,
tchain_edit_deb
1463275 15228549.107820: 282495 instructions:u: 401133 f3+0xd (/home/kan/os.li>
tchain_edit.c:22
f3+40: tchain_edit.c:20
000000000040114e jle 0x401133 # PRED 6 cycles [6]
0000000000401133 movl -0x4(%rbp), %eax
0000000000401136 and $0x1, %eax
0000000000401139 test %eax, %eax
000000000040113b jz 0x401143
000000000040113d addl $0x1, -0x4(%rbp)
0000000000401141 jmp 0x401147 # PRED 3 cycles [9] 2.00 IPC
0000000000401147 cmpl $0x3e7, -0x4(%rbp)
000000000040114e jle 0x401133 # PRED 6 cycles [15] 0.33 IPC
After the patch,
tchain_edit_deb
1463275 15228549.107820: 282495 instructions:u: 401133 f3+0xd (/home/kan/os.li>
tchain_edit.c:22
f3+40: tchain_edit.c:20
000000000040114e jle 0x401133 srcline: tchain_edit.c:20 # PRED 6 cycles [6]
0000000000401133 movl -0x4(%rbp), %eax
0000000000401136 and $0x1, %eax
0000000000401139 test %eax, %eax
000000000040113b jz 0x401143
000000000040113d addl $0x1, -0x4(%rbp)
0000000000401141 jmp 0x401147 srcline: tchain_edit.c:23 # PRED 3 cycles [9] 2.00 IPC
0000000000401147 cmpl $0x3e7, -0x4(%rbp)
000000000040114e jle 0x401133 srcline: tchain_edit.c:20 # PRED 6 cycles [15] 0.33 IPC
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: ahmad.yasin@intel.com
Cc: amiri.khalil@intel.com
Cc: ak@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240205145819.1943114-1-kan.liang@linux.intel.com
Ian Rogers [Tue, 6 Feb 2024 23:59:02 +0000 (15:59 -0800)]
perf kvm powerpc: Fix build
Updates to struct parse_events_error needed to be carried through to
PowerPC specific event parsing.
Fixes: fd7b8e8fb20f ("perf parse-events: Print all errors")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung <namhyung@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240206235902.2917395-1-irogers@google.com
Madhavan Srinivasan [Mon, 29 Jan 2024 12:08:55 +0000 (17:38 +0530)]
perf/pmu-events/powerpc: Update json mapfile with Power11 PVR
Update the Power11 PVR to json mapfile to enable
json events. Power11 is PowerISA v3.1 compliant
and support Power10 events.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: atrajeev@linux.vnet.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240129120855.551529-1-maddy@linux.ibm.com
Ben Gainey [Tue, 23 Jan 2024 10:31:37 +0000 (10:31 +0000)]
tools: perf: Expose sample ID / stream ID to python scripts
perf script exposes the evsel_name to python scripts as part of the data
passed to the sample or tracepoint handler function, and it passes the id and
stream_id to the throttled/unthrottled handler functions. This makes matching
throttle events and samples difficult.
To make this possible, this change exposes the sample id and stream_id values
to the script.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: will@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240123103137.1890779-2-ben.gainey@arm.com
Arnaldo Carvalho de Melo [Fri, 2 Feb 2024 14:32:20 +0000 (11:32 -0300)]
perf bpf: Clean up the generated/copied vmlinux.h
When building perf with BPF skels we either copy the minimalistic
tools/perf/util/bpf_skel/vmlinux/vmlinux.h or use bpftool to generate a
vmlinux from BTF, storing the result in $(SKEL_OUT)/vmlinux.h.
We need to remove that when doing a 'make -C tools/perf clean', fix it.
Fixes: b7a2d774c9c5a9a3 ("perf build: Add ability to build with a generated vmlinux.h")
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/Zbz89KK5wHfZ82jv@x1
Ian Rogers [Wed, 31 Jan 2024 20:14:29 +0000 (12:14 -0800)]
perf jevents: Drop or simplify small integer values
Prior to this patch '0' would be dropped as the config values default
to 0. Some json values are hex and the string '0' wouldn't match '0x0'
as zero. Add a more robust is_zero test to drop these event terms.
When encoding numbers as hex, if the number is between 0 and 9
inclusive then don't add a 0x prefix.
Update test expectations for these changes.
On x86 this reduces the event/metric C string by 58,411 bytes.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240131201429.792138-1-irogers@google.com
Ian Rogers [Wed, 31 Jan 2024 13:49:40 +0000 (05:49 -0800)]
perf parse-events: Print all errors
Prior to this patch the first and the last error encountered during
parsing are printed. To see other errors verbose needs
enabling. Unfortunately this can drop useful errors, in particular on
terms. This patch changes the errors so that instead of the first and
last all errors are recorded and printed, the underlying data
structure is changed to a list.
Before:
```
$ perf stat -e 'slots/edge=2/' true
event syntax error: 'slots/edge=2/'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'slots'
Initial error:
event syntax error: 'slots/edge=2/'
\___ Cannot find PMU `slots'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
After:
```
$ perf stat -e 'slots/edge=2/' true
event syntax error: 'slots/edge=2/'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'slots'
event syntax error: 'slots/edge=2/'
\___ value too big for format (edge), maximum is 1
event syntax error: 'slots/edge=2/'
\___ Cannot find PMU `slots'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: tchen168@asu.edu
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240131134940.593788-3-irogers@google.com
Ian Rogers [Wed, 31 Jan 2024 13:49:39 +0000 (05:49 -0800)]
perf parse-events: Improve error location of terms cloned from an event
A PMU event/alias will have a set of format terms that replace it when
an event is parsed. The location of the terms is their position when
parsed for the event/alias either from sysfs or json. This location is
of little use when an event fails to parse as the error will be given
in terms of the location in the string of events parsed not the json
or sysfs string. Fix this by making the cloned terms location that of
the event/alias.
If a cloned term from an event/alias is invalid the bad format is hard
to determine from the error string. Add the name of the bad format
into the error string.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: tchen168@asu.edu
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240131134940.593788-2-irogers@google.com
Ian Rogers [Wed, 31 Jan 2024 13:49:38 +0000 (05:49 -0800)]
perf tsc: Add missing newlines to debug statements
It is assumed that debug statements always print a newline, fix two
missing ones.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: tchen168@asu.edu
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240131134940.593788-1-irogers@google.com
Andi Kleen [Wed, 31 Jan 2024 02:13:52 +0000 (18:13 -0800)]
perf Documentation: Add some more hints to tips.txt
Add some (hopefully useful) hints to tips.txt
Also some minor corrections.
Would probably good to make it a reviewer rule that if generally useful
options are added the patch must add an example to tips.txt
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240131021352.151440-1-ak@linux.intel.com
Weilin Wang [Tue, 30 Jan 2024 18:09:07 +0000 (10:09 -0800)]
perf test: Simplify metric value validation test final report
The original test report was too complicated to read with information
that not really useful. This new update simplify the report which should
largely improve the readibility.
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240130180907.639729-1-weilin.wang@intel.com
Andi Kleen [Tue, 30 Jan 2024 18:55:52 +0000 (10:55 -0800)]
perf report: Prevent segfault with --no-parent
Prevent a perf report segfault with the (non sensical) --no-parent
option
Signed-off-By: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung <namhyung@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240130185552.150578-1-ak@linux.intel.com
Yang Jihong [Sat, 27 Jan 2024 02:57:56 +0000 (02:57 +0000)]
perf evsel: Fix duplicate initialization of data->id in evsel__parse_sample()
data->id has been initialized at line 2362, remove duplicate initialization.
Fixes: 3ad31d8a0df2 ("perf evsel: Centralize perf_sample initialization")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240127025756.4041808-1-yangjihong1@huawei.com
Ze Gao [Tue, 23 Jan 2024 07:02:11 +0000 (02:02 -0500)]
perf evsel: Rename get_states() to parse_task_states() and make it public
Since get_states() assumes the existence of libtraceevent, so move
to where it should belong, i.e, util/trace-event-parse.c, and also
rename it to parse_task_states().
Leave evsel_getstate() untouched as it fits well in the evsel
category.
Also make some necessary tweaks for python support, and get it
verified with: perf test python.
Signed-off-by: Ze Gao <zegao@tencent.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240123070210.1669843-2-zegao@tencent.com
Arnaldo Carvalho de Melo [Wed, 31 Jan 2024 14:24:40 +0000 (11:24 -0300)]
perf tools headers: update the asm-generic/unaligned.h copy with the kernel sources
To pick up the changes in:
1ab33c03145d0f6c ("asm-generic: make sparse happy with odd-sized put_unaligned_*()")
Addressing this perf tools build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/asm-generic/unaligned.h include/asm-generic/unaligned.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zbp9I7rmFj1Owhug@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 30 Jan 2024 14:44:00 +0000 (11:44 -0300)]
tools include UAPI: Sync linux/mount.h copy with the kernel sources
To pick the changes from:
35e27a5744131996 ("fs: keep struct mnt_id_req extensible")
b4c2bea8ceaa50cd ("add listmount(2) syscall")
46eae99ef73302f9 ("add statmount(2) syscall")
That doesn't change anything in tools this time as nothing that is
harvested by the beauty scripts got changed:
$ ls -1 tools/perf/trace/beauty/*mount*sh
tools/perf/trace/beauty/fsmount.sh
tools/perf/trace/beauty/mount_flags.sh
tools/perf/trace/beauty/move_mount_flags.sh
$
This addresses this perf build warning.
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbkMiB7ZcOsLP2V5@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Wed, 24 Jan 2024 09:43:57 +0000 (09:43 +0000)]
perf evlist: Fix evlist__new_default() for > 1 core PMU
The 'Session topology' test currently fails with this message when
evlist__new_default() opens more than one event:
32: Session topology :
--- start ---
templ file: /tmp/perf-test-vv5YzZ
Using CPUID 0x00000000410fd070
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xb00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5
non matching sample_type
FAILED tests/topology.c:73 can't get session
---- end ----
Session topology: FAILED!
This is because when re-opening the file and parsing the header, Perf
expects that any file that has more than one event has the sample ID
flag set. Perf record already sets the flag in a similar way when there
is more than one event, so add the same logic to evlist__new_default().
evlist__new_default() is only currently used in tests, so I don't
expect this change to have any other side effects. The other tests that
use it don't save and re-open the file so don't hit this issue.
The session topology test has been failing on Arm big.LITTLE platforms
since commit
251aa040244a3b17 ("perf parse-events: Wildcard most
"numeric" events") when evlist__new_default() started opening multiple
events for 'cycles'.
Fixes: 251aa040244a3b17 ("perf parse-events: Wildcard most "numeric" events")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
[ This was failing as well on a Rocket Lake Refresh/14700k Intel hybrid system - Arnaldo ]
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Closes: https://lore.kernel.org/lkml/CAP-5=fWVQ-7ijjK3-w1q+k2WYVNHbAcejb-xY0ptbjRw476VKA@mail.gmail.com/
Link: https://lore.kernel.org/r/20240124094358.489372-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 30 Jan 2024 13:45:24 +0000 (10:45 -0300)]
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
This is to get the changes from:
94ea9c05219518ef ("x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>")
10f4c9b9a33b7df0 ("x86/asm: Fix build of UML with KASAN")
That addresses these perf tools build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/lkml/ZbkIKpKdNqOFdMwJ@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 30 Jan 2024 13:09:18 +0000 (10:09 -0300)]
tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes
To pick the changes from:
1e536e10689700e0 ("x86/cpu: Detect TDX partial write machine check erratum")
765a0542fdc7aad7 ("x86/virt/tdx: Detect TDX during kernel boot")
30fa92832f405d5a ("x86/CPU/AMD: Add ZenX generations flags")
04c3024560d3a14a ("x86/barrier: Do not serialize MSR accesses on AMD")
This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 29 Jan 2024 16:00:07 +0000 (13:00 -0300)]
tools headers UAPI: Sync unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers
To pick the changes in these csets:
d8b0f5465012538c ("wire up syscalls for statmount/listmount")
5f42375904b08890 ("LSM: wireup Linux Security Module syscalls")
Used in some architectures to create syscall tables.
This addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbfMuAlUMRO9Hqa6@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 4 Jan 2024 23:19:03 +0000 (15:19 -0800)]
perf vendor events intel: Alderlake/sapphirerapids metric fixes
As events are deduplicated by name, ensure PMU prefixes are always
used in metrics. Previously they may be missed on the first event in a
formula.
Update metric constraints for architectures with topdown l2 events.
Conversion script updated in:
https://github.com/intel/perfmon/pull/128
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Closes: https://lore.kernel.org/lkml/ZZam-EG-UepcXtWw@kernel.org/
Link: https://lore.kernel.org/r/20240104231903.775717-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 27 Jan 2024 17:52:23 +0000 (14:52 -0300)]
tools headers UAPI: Sync kvm headers with the kernel sources
To pick the changes in:
a5d3df8ae13fada7 ("KVM: remove deprecated UAPIs")
6d72283526090850 ("KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT")
89ea60c2c7b5838b ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
8dd2eee9d526c30f ("KVM: x86/mmu: Handle page fault for private memory")
a7800aa80ea4d535 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
5a475554db1e476a ("KVM: Introduce per-page memory attributes")
16f95f3b95caded2 ("KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace")
bb58b90b1a8f753b ("KVM: Introduce KVM_SET_USER_MEMORY_REGION2")
3f9cd0ca848413fd ("KVM: arm64: Allow userspace to get the writable masks for feature ID registers")
That automatically adds support for some new ioctls and remove a bunch
of deprecated ones.
This ends up making the new binary to forget about the deprecated one,
so when used in an older system it will not be able to resolve those
codes to strings.
$ tools/perf/trace/beauty/kvm_ioctl.sh > before
$ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
$ tools/perf/trace/beauty/kvm_ioctl.sh > after
$ diff -u before after
--- before 2024-01-27 14:48:16.
523014020 -0300
+++ after 2024-01-27 14:48:24.
183932866 -0300
@@ -14,6 +14,7 @@
[0x46] = "SET_USER_MEMORY_REGION",
[0x47] = "SET_TSS_ADDR",
[0x48] = "SET_IDENTITY_MAP_ADDR",
+ [0x49] = "SET_USER_MEMORY_REGION2",
[0x60] = "CREATE_IRQCHIP",
[0x61] = "IRQ_LINE",
[0x62] = "GET_IRQCHIP",
@@ -22,14 +23,8 @@
[0x65] = "GET_PIT",
[0x66] = "SET_PIT",
[0x67] = "IRQ_LINE_STATUS",
- [0x69] = "ASSIGN_PCI_DEVICE",
[0x6a] = "SET_GSI_ROUTING",
- [0x70] = "ASSIGN_DEV_IRQ",
[0x71] = "REINJECT_CONTROL",
- [0x72] = "DEASSIGN_PCI_DEVICE",
- [0x73] = "ASSIGN_SET_MSIX_NR",
- [0x74] = "ASSIGN_SET_MSIX_ENTRY",
- [0x75] = "DEASSIGN_DEV_IRQ",
[0x76] = "IRQFD",
[0x77] = "CREATE_PIT2",
[0x78] = "SET_BOOT_CPU_ID",
@@ -66,7 +61,6 @@
[0x9f] = "GET_VCPU_EVENTS",
[0xa0] = "SET_VCPU_EVENTS",
[0xa3] = "ENABLE_CAP",
- [0xa4] = "ASSIGN_SET_INTX_MASK",
[0xa5] = "SIGNAL_MSI",
[0xa6] = "GET_XCRS",
[0xa7] = "SET_XCRS",
@@ -97,6 +91,8 @@
[0xcd] = "SET_SREGS2",
[0xce] = "GET_STATS_FD",
[0xd0] = "XEN_HVM_EVTCHN_SEND",
+ [0xd2] = "SET_MEMORY_ATTRIBUTES",
+ [0xd4] = "CREATE_GUEST_MEMFD",
[0xe0] = "CREATE_DEVICE",
[0xe1] = "SET_DEVICE_ATTR",
[0xe2] = "GET_DEVICE_ATTR",
$
This silences these perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/lkml/ZbVLbkngp4oq13qN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sun Haiyong [Sat, 6 Jan 2024 09:41:29 +0000 (17:41 +0800)]
perf tools: Fix calloc() arguments to address error introduced in gcc-14
the definition of calloc is as follows:
void *calloc(size_t nmemb, size_t size);
number of members is in the first parameter and the size is in the
second parameter.
Fix error messages on gcc 14
20240102:
error: 'calloc' sizes specified with 'sizeof' in the earlier argument and
not in the later argument [-Werror=calloc-transposed-args]
Committer notes:
I noticed this on fedora 40 and rawhide.
Signed-off-by: Sun Haiyong <sunhaiyong@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240106094129.3337057-1-siyanteng@loongson.cn
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sun Haiyong [Mon, 4 Dec 2023 08:20:55 +0000 (16:20 +0800)]
perf top: Remove needless malloc(0) call that triggers -Walloc-size
GCC 14 introduces a new -Walloc-size included in -Wextra which errors out
like:
builtin-top.c: In function ‘prompt_integer’:
builtin-top.c:360:21: error: allocation of insufficient size ‘0’ for
type ‘char’ with size ‘1’ [-Werror=alloc-size]
360 | char *buf = malloc(0), *p;
| ^~~~~~
Just set it to NULL, getline() will do the allocation.
Signed-off-by: Sun Haiyong <sunhaiyong@loongson.cn>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231204082055.91877-1-siyanteng@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yicong Yang [Mon, 22 Jan 2024 08:04:06 +0000 (16:04 +0800)]
perf build: Make minimal shellcheck version to v0.6.0
The perf build failed due to the shellcheck on my machine (v0.4.6 on Ubuntu
18.04.1 LTS) doesn't support -a/--check-sourced and -S/--severity option.
These two options are introduced in shellcheck v0.4.7 and v0.6.0
respectively. So restrict the minimal version of shellcheck to v0.6.0.
Fixes: b809fc656e763296 ("perf build: Shellcheck support for OUTPUT directory")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20240122080406.28678-1-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 26 Jan 2024 14:00:01 +0000 (11:00 -0300)]
tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB
Picking the changes from:
8570c27932e132d2 ("drm/syncobj: Add deadline support for syncobj waits")
9724ed6c1b1212d1 ("drm: Introduce DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT")
e4d983acffff270c ("drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP")
d208d875667e2a29 ("drm: introduce CLOSEFB IOCTL")
afa5cf3175a22b71 ("drm/i915/uapi: fix typos/spellos and punctuation")
Addressing these perf build warnings:
Warning: Kernel ABI header differences:
Now 'perf trace' and other code that might use the
tools/perf/trace/beauty autogenerated tables will be able to translate
this new ioctl command into a string:
$ tools/perf/trace/beauty/drm_ioctl.sh > before
$ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
$ tools/perf/trace/beauty/drm_ioctl.sh > after
$ diff -u before after
--- before 2024-01-26 10:54:23.
486381862 -0300
+++ after 2024-01-26 10:54:35.
767902442 -0300
@@ -109,6 +109,7 @@
[0xCD] = "SYNCOBJ_TIMELINE_SIGNAL",
[0xCE] = "MODE_GETFB2",
[0xCF] = "SYNCOBJ_EVENTFD",
+ [0xD0] = "MODE_CLOSEFB",
[DRM_COMMAND_BASE + 0x00] = "I915_INIT",
[DRM_COMMAND_BASE + 0x01] = "I915_FLUSH",
[DRM_COMMAND_BASE + 0x02] = "I915_FLIP",
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Simon Ser <contact@emersion.fr>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/lkml/ZbPIN9Dcc5AM0uxo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 04:30:15 +0000 (20:30 -0800)]
perf test shell daemon: Make signal test less racy
The daemon signal test sends signals and then expects files to be
written. It was observed on an Intel Alderlake that the signals were
sent too quickly leading to the 3 expected files not appearing.
To avoid this send the next signal only after the expected previous file
has appeared. To avoid an infinite loop the number of retries is
limited.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Shirisha G <shirisha@linux.ibm.com>
Link: https://lore.kernel.org/r/20240124043015.1388867-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 04:30:14 +0000 (20:30 -0800)]
perf test shell script: Fix test for python being disabled
"grep -cv" can exit with an error code that causes the "set -e" to abort
the script. Switch to using the grep exit code in the if condition to
avoid this.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Shirisha G <shirisha@linux.ibm.com>
Link: https://lore.kernel.org/r/20240124043015.1388867-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 04:30:13 +0000 (20:30 -0800)]
perf test: Workaround debug output in list test
Write the JSON output to a specific file to avoid debug output
breaking it.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Shirisha G <shirisha@linux.ibm.com>
Link: https://lore.kernel.org/r/20240124043015.1388867-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 04:30:12 +0000 (20:30 -0800)]
perf list: Add output file option
Add an option to write the 'perf list' output to a specific file. This
can avoid issues with debug output being written into the output stream.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Shirisha G <shirisha@linux.ibm.com>
Link: https://lore.kernel.org/r/20240124043015.1388867-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 24 Jan 2024 04:30:11 +0000 (20:30 -0800)]
perf list: Switch error message to pr_err() to respect debug settings (-v)
Using printf() can interrupt 'perf list output', use pr_err() which can
respect debug settings and the debug file.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Shirisha G <shirisha@linux.ibm.com>
Link: https://lore.kernel.org/r/20240124043015.1388867-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Thu, 25 Jan 2024 10:03:51 +0000 (11:03 +0100)]
perf test: Fix 'perf script' tests on s390
In linux next repo, test case 'perf script tests' fails on s390.
The root case is a command line invocation of 'perf record' with
call-graph information. On s390 only DWARF formatted call-graphs are
supported and only on software events.
Change the command line parameters for s390.
Output before:
# perf test 89
89: perf script tests : FAILED!
#
Output after:
# perf test 89
89: perf script tests : Ok
#
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20240125100351.936262-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 25 Jan 2024 14:23:56 +0000 (11:23 -0300)]
tools headers UAPI: Sync linux/fcntl.h with the kernel sources
To get the changes in:
8a924db2d7b5eb69 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
That don't add anything that is handled by existing hard coded tables or
table generation scripts.
This silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Link: https://lore.kernel.org/lkml/ZbJv9fGF_k2xXEdr@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 25 Jan 2024 14:08:44 +0000 (11:08 -0300)]
tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING
To pick up the changes in:
765a0542fdc7aad7 ("x86/virt/tdx: Detect TDX during kernel boot")
Addressing this tools/perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2024-01-25 11:08:12.
363223880 -0300
+++ after 2024-01-25 11:08:24.
839307699 -0300
@@ -21,6 +21,7 @@
[0x0000004f] = "PPIN",
[0x00000060] = "LBR_CORE_TO",
[0x00000079] = "IA32_UCODE_WRITE",
+ [0x00000087] = "IA32_MKTME_KEYID_PARTITIONING",
[0x0000008b] = "AMD64_PATCH_LEVEL",
[0x0000008C] = "IA32_SGXLEPUBKEYHASH0",
[0x0000008D] = "IA32_SGXLEPUBKEYHASH1",
$
Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
Using CPUID GenuineIntel-6-8E-A
0x87
New filter for msr:read_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
0x87
New filter for msr:write_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
mmap size
528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid !=
2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid !=
2612129 && common_pid != 3841)
mmap size
528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/
2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbJt27rjkQVU1YoP@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 25 Jan 2024 14:00:14 +0000 (11:00 -0300)]
tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_MNT_ID_UNIQUE
To pick the changes from:
98d2b43081972abe ("add unique mount ID")
That add STATX_MNT_ID_UNIQUE that was manually added to
tools/perf/trace/beauty/statx.c, at some point this should move to the
shell based automated way.
This silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbJq08s19890WDo-@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Thu, 25 Jan 2024 05:51:24 +0000 (21:51 -0800)]
perf tools: Add -H short option for --hierarchy
I found the hierarchy mode useful, but it's easy to make a typo when
using it. Let's add a short option for that.
Also update the documentation. :)
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20240125055124.1579617-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ian Rogers [Wed, 24 Jan 2024 23:42:00 +0000 (15:42 -0800)]
perf pmu: Treat the msr pmu as software
The msr PMU is a software one, meaning msr events may be grouped
with events in a hardware context. As the msr PMU isn't marked as a
software PMU by perf_pmu__is_software, groups with the msr PMU in
are broken and the msr events placed in a different group. This
may lead to multiplexing errors where a hardware event isn't
counted while the msr event, such as tsc, is. Fix all of this by
marking the msr PMU as software, which agrees with the driver.
Before:
```
$ perf stat -e '{slots,tsc}' -a true
WARNING: events were regrouped to match PMUs
Performance counter stats for 'system wide':
1,750,335 slots
4,243,557 tsc
0.
001456717 seconds time elapsed
```
After:
```
$ perf stat -e '{slots,tsc}' -a true
Performance counter stats for 'system wide':
12,526,380 slots
3,415,163 tsc
0.
001488360 seconds time elapsed
```
Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20240124234200.1510417-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Linus Torvalds [Thu, 25 Jan 2024 18:58:35 +0000 (10:58 -0800)]
Merge tag 'net-6.8-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, netfilter and WiFi.
Jakub is doing a lot of work to include the self-tests in our CI, as a
result a significant amount of self-tests related fixes is flowing in
(and will likely continue in the next few weeks).
Current release - regressions:
- bpf: fix a kernel crash for the riscv 64 JIT
- bnxt_en: fix memory leak in bnxt_hwrm_get_rings()
- revert "net: macsec: use skb_ensure_writable_head_tail to expand
the skb"
Previous releases - regressions:
- core: fix removing a namespace with conflicting altnames
- tc/flower: fix chain template offload memory leak
- tcp:
- make sure init the accept_queue's spinlocks once
- fix autocork on CPUs with weak memory model
- udp: fix busy polling
- mlx5e:
- fix out-of-bound read in port timestamping
- fix peer flow lists corruption
- iwlwifi: fix a memory corruption
Previous releases - always broken:
- netfilter:
- nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress
basechain
- nft_limit: reject configurations that cause integer overflow
- bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a
NULL pointer dereference upon shrinking
- llc: make llc_ui_sendmsg() more robust against bonding changes
- smc: fix illegal rmb_desc access in SMC-D connection dump
- dpll: fix pin dump crash for rebound module
- bnxt_en: fix possible crash after creating sw mqprio TCs
- hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB
Misc:
- several self-tests fixes for better integration with the netdev CI
- added several missing modules descriptions"
* tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
tsnep: Remove FCS for XDP data path
net: fec: fix the unhandled context fault from smmu
selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
fjes: fix memleaks in fjes_hw_setup
i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
i40e: set xdp_rxq_info::frag_size
xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
ice: remove redundant xdp_rxq_info registration
i40e: handle multi-buffer packets that are shrunk by xdp prog
ice: work on pre-XDP prog frag count
xsk: fix usage of multi-buffer BPF helpers for ZC XDP
xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
xsk: recycle buffer in case Rx queue was full
net: fill in MODULE_DESCRIPTION()s for rvu_mbox
net: fill in MODULE_DESCRIPTION()s for litex
net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio
net: fill in MODULE_DESCRIPTION()s for fec
...
Linus Torvalds [Thu, 25 Jan 2024 18:52:30 +0000 (10:52 -0800)]
Merge tag 'ovl-fixes-6.8-rc2' of git://git./linux/kernel/git/overlayfs/vfs
Pull overlayfs fix from Amir Goldstein:
"Change the on-disk format for the new "xwhiteouts" feature introduced
in v6.7
The change reduces unneeded overhead of an extra getxattr per readdir.
The only user of the "xwhiteout" feature is the external composefs
tool, which has been updated to support the new on-disk format.
This change is also designated for 6.7.y"
* tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: mark xwhiteouts directory with overlay.opaque='x'
Linus Torvalds [Thu, 25 Jan 2024 18:41:29 +0000 (10:41 -0800)]
Merge tag 'vfs-6.8-rc2.netfs' of git://git./linux/kernel/git/vfs/vfs
Pull netfs fixes from Christian Brauner:
"This contains various fixes for the netfs work merged earlier this
cycle:
afs:
- Fix locking imbalance in afs_proc_addr_prefs_show()
- Remove afs_dynroot_d_revalidate() which is redundant
- Fix error handling during lookup
- Hide sillyrenames from userspace. This fixes a race between
silly-rename files being created/removed and userspace iterating
over directory entries
- Don't use unnecessary folio_*() functions
cifs:
- Don't use unnecessary folio_*() functions
cachefiles:
- erofs: Fix Null dereference when cachefiles are not doing
ondemand-mode
- Update mailing list
netfs library:
- Add Jeff Layton as reviewer
- Update mailing list
- Fix a error checking in netfs_perform_write()
- fscache: Check error before dereferencing
- Don't use unnecessary folio_*() functions"
* tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Fix missing/incorrect unlocking of RCU read lock
afs: Remove afs_dynroot_d_revalidate() as it is redundant
afs: Fix error handling with lookup via FS.InlineBulkStatus
afs: Hide silly-rename files from userspace
cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
netfs, fscache: Prevent Oops in fscache_put_cache()
cifs: Don't use certain unnecessary folio_*() functions
afs: Don't use certain unnecessary folio_*() functions
netfs: Don't use certain unnecessary folio_*() functions
netfs: Add Jeff Layton as reviewer
netfs, cachefiles: Change mailing list
Linus Torvalds [Thu, 25 Jan 2024 18:26:52 +0000 (10:26 -0800)]
Merge tag 'nfsd-6.8-1' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix in-kernel RPC UDP transport
- Fix NFSv4.0 RELEASE_LOCKOWNER
* tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix RELEASE_LOCKOWNER
SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()
Linus Torvalds [Thu, 25 Jan 2024 18:21:21 +0000 (10:21 -0800)]
Merge tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux
Pull RCU fix from Neeraj Upadhyay:
"This fixes RCU grace period stalls, which are observed when an
outgoing CPU's quiescent state reporting results in wakeup of one of
the grace period kthreads, to complete the grace period.
If those kthreads have SCHED_FIFO policy, the wake up can indirectly
arm the RT bandwith timer to the local offline CPU.
Earlier migration of the hrtimers from the CPU introduced in commit
5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU
earlier") results in this timer getting ignored.
If the RCU grace period kthreads are waiting for RT bandwidth to be
available, they may never be actually scheduled, resulting in RCU
stall warnings"
* tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux:
rcu: Defer RCU kthreads wakeup when CPU is dying
Paolo Abeni [Thu, 25 Jan 2024 10:59:44 +0000 (11:59 +0100)]
Merge branch 'tsnep-xdp-fixes'
Gerhard Engleder says:
====================
tsnep: XDP fixes
Found two driver specific problems during XDP and XSK testing.
====================
Link: https://lore.kernel.org/r/20240123200918.61219-1-gerhard@engleder-embedded.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Gerhard Engleder [Tue, 23 Jan 2024 20:09:18 +0000 (21:09 +0100)]
tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
The fill ring of the XDP socket may contain not enough buffers to
completey fill the RX queue during socket creation. In this case the
flag XDP_RING_NEED_WAKEUP is not set as this flag is only set if the RX
queue is not completely filled during polling.
Set XDP_RING_NEED_WAKEUP flag also if RX queue is not completely filled
during XDP socket creation.
Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Gerhard Engleder [Tue, 23 Jan 2024 20:09:17 +0000 (21:09 +0100)]
tsnep: Remove FCS for XDP data path
The RX data buffer includes the FCS. The FCS is already stripped for the
normal data path. But for the XDP data path the FCS is included and
acts like additional/useless data.
Remove the FCS from the RX data buffer also for XDP.
Fixes: 65b28c810035 ("tsnep: Add XDP RX support")
Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 25 Jan 2024 10:42:27 +0000 (11:42 +0100)]
Merge tag 'mlx5-fixes-2024-01-24' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2024-01-24
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
* tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: fix a potential double-free in fs_any_create_groups
net/mlx5e: fix a double-free in arfs_create_groups
net/mlx5e: Ignore IPsec replay window values on sender side
net/mlx5e: Allow software parsing when IPsec crypto is enabled
net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO
net/mlx5: DR, Can't go to uplink vport on RX rule
net/mlx5: DR, Use the right GVMI number for drop action
net/mlx5: Bridge, fix multicast packets sent to uplink
net/mlx5: Fix a WARN upon a callback command failure
net/mlx5e: Fix peer flow lists handling
net/mlx5e: Fix inconsistent hairpin RQT sizes
net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context
net/mlx5: Fix query of sd_group field
net/mlx5e: Use the correct lag ports number when creating TISes
====================
Link: https://lore.kernel.org/r/20240124081855.115410-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 25 Jan 2024 10:30:31 +0000 (11:30 +0100)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-01-25
The following pull-request contains BPF updates for your *net* tree.
We've added 12 non-merge commits during the last 2 day(s) which contain
a total of 13 files changed, 190 insertions(+), 91 deletions(-).
The main changes are:
1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which
support XDP multi-buffer. The former triggered a NULL pointer
dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar.
2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and
epilogue for struct_ops programs, from Pu Lehui.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
i40e: set xdp_rxq_info::frag_size
xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
ice: remove redundant xdp_rxq_info registration
i40e: handle multi-buffer packets that are shrunk by xdp prog
ice: work on pre-XDP prog frag count
xsk: fix usage of multi-buffer BPF helpers for ZC XDP
xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
xsk: recycle buffer in case Rx queue was full
riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops
====================
Link: https://lore.kernel.org/r/20240125084416.10876-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shenwei Wang [Tue, 23 Jan 2024 16:51:41 +0000 (10:51 -0600)]
net: fec: fix the unhandled context fault from smmu
When repeatedly changing the interface link speed using the command below:
ethtool -s eth0 speed 100 duplex full
ethtool -s eth0 speed 1000 duplex full
The following errors may sometimes be reported by the ARM SMMU driver:
[ 5395.035364] fec
5b040000.ethernet eth0: Link is Down
[ 5395.039255] arm-smmu
51400000.iommu: Unhandled context fault:
fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2
[ 5398.108460] fec
5b040000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control off
It is identified that the FEC driver does not properly stop the TX queue
during the link speed transitions, and this results in the invalid virtual
I/O address translations from the SMMU and causes the context faults.
Fixes: dbc64a8ea231 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Tue, 23 Jan 2024 07:59:17 +0000 (15:59 +0800)]
selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
The prio_arp/ns tests hard code the mode to active-backup. At the same
time, The balance-alb/tlb modes do not support arp/ns target. So remove
the prio_arp/ns tests from the loop and only test active-backup mode.
Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests")
Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Closes: https://lore.kernel.org/netdev/17415.1705965957@famine/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240123075917.1576360-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 25 Jan 2024 05:03:16 +0000 (21:03 -0800)]
Merge tag 'nf-24-01-24' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Update nf_tables kdoc to keep it in sync with the code, from George Guo.
2) Handle NETDEV_UNREGISTER event for inet/ingress basechain.
3) Reject configuration that cause nft_limit to overflow,
from Florian Westphal.
4) Restrict anonymous set/map names to 16 bytes, from Florian Westphal.
5) Disallow to encode queue number and error in verdicts. This reverts
a patch which seems to have introduced an early attempt to support for
nfqueue maps, which is these days supported via nft_queue expression.
6) Sanitize family via .validate for expressions that explicitly refer
to NF_INET_* hooks.
* tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: validate NFPROTO_* family
netfilter: nf_tables: reject QUEUE/DROP verdict parameters
netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
netfilter: nft_limit: reject configurations that cause integer overflow
netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain
netfilter: nf_tables: cleanup documentation
====================
Link: https://lore.kernel.org/r/20240124191248.75463-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhipeng Lu [Mon, 22 Jan 2024 17:24:42 +0000 (01:24 +0800)]
fjes: fix memleaks in fjes_hw_setup
In fjes_hw_setup, it allocates several memory and delay the deallocation
to the fjes_hw_exit in fjes_probe through the following call chain:
fjes_probe
|-> fjes_hw_init
|-> fjes_hw_setup
|-> fjes_hw_exit
However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus
all the resources allocated in fjes_hw_setup will be leaked. In this
patch, we free those resources in fjes_hw_setup and prevents such leaks.
Fixes: 2fcbca687702 ("fjes: platform_driver's .probe and .remove routine")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 25 Jan 2024 00:59:52 +0000 (16:59 -0800)]
Merge tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A fix to avoid triggering an assert in some cases where RBD exclusive
mappings are involved and a deprecated API cleanup"
* tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client:
rbd: don't move requests to the running list on errors
rbd: remove usage of the deprecated ida_simple_*() API
Linus Torvalds [Thu, 25 Jan 2024 00:51:59 +0000 (16:51 -0800)]
Merge tag 'integrity-v6.8-rc1' of git://git./linux/kernel/git/zohar/linux-integrity
Pull integrity fix from Mimi Zohar:
"Revert patch that required user-provided key data, since keys can be
created from kernel-generated random numbers"
* tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
Revert "KEYS: encrypted: Add check for strsep"
Alexei Starovoitov [Thu, 25 Jan 2024 00:24:07 +0000 (16:24 -0800)]
Merge branch 'net-bpf_xdp_adjust_tail-and-intel-mbuf-fixes'
Maciej Fijalkowski says:
====================
net: bpf_xdp_adjust_tail() and Intel mbuf fixes
Hey,
after a break followed by dealing with sickness, here is a v6 that makes
bpf_xdp_adjust_tail() actually usable for ZC drivers that support XDP
multi-buffer. Since v4 I tried also using bpf_xdp_adjust_tail() with
positive offset which exposed yet another issues, which can be observed
by increased commit count when compared to v3.
John, in the end I think we should remove handling
MEM_TYPE_XSK_BUFF_POOL from __xdp_return(), but it is out of the scope
for fixes set, IMHO.
Thanks,
Maciej
v6:
- add acks [Magnus]
- fix spelling mistakes [Magnus]
- avoid touching xdp_buff in xp_alloc_{reused,new_from_fq}() [Magnus]
- s/shrink_data/bpf_xdp_shrink_data [Jakub]
- remove __shrink_data() [Jakub]
- check retvals from __xdp_rxq_info_reg() [Magnus]
v5:
- pick correct version of patch 5 [Simon]
- elaborate a bit more on what patch 2 fixes
v4:
- do not clear frags flag when deleting tail; xsk_buff_pool now does
that
- skip some NULL tests for xsk_buff_get_tail [Martin, John]
- address problems around registering xdp_rxq_info
- fix bpf_xdp_frags_increase_tail() for ZC mbuf
v3:
- add acks
- s/xsk_buff_tail_del/xsk_buff_del_tail
- address i40e as well (thanks Tirthendu)
v2:
- fix !CONFIG_XDP_SOCKETS builds
- add reviewed-by tag to patch 3
====================
Link: https://lore.kernel.org/r/20240124191602.566724-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:16:02 +0000 (20:16 +0100)]
i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
Now that i40e driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. i40e_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.
Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-12-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:16:01 +0000 (20:16 +0100)]
i40e: set xdp_rxq_info::frag_size
i40e support XDP multi-buffer so it is supposed to use
__xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the
frag_size. It can not be simply converted at existing callsite because
rx_buf_len could be un-initialized, so let us register xdp_rxq_info
within i40e_configure_rx_ring(), which happen to be called with already
initialized rx_buf_len value.
Commit
5180ff1364bc ("i40e: use int for i40e_status") converted 'err' to
int, so two variables to deal with return codes are not needed within
i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status
from xdp_rxq_info registration.
Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-11-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:16:00 +0000 (20:16 +0100)]
xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
XSK ZC Rx path calculates the size of data that will be posted to XSK Rx
queue via subtracting xdp_buff::data_end from xdp_buff::data.
In bpf_xdp_frags_increase_tail(), when underlying memory type of
xdp_rxq_info is MEM_TYPE_XSK_BUFF_POOL, add offset to data_end in tail
fragment, so that later on user space will be able to take into account
the amount of bytes added by XDP program.
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-10-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:15:59 +0000 (20:15 +0100)]
ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
Now that ice driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. ice_rx_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.
Use a bigger hammer and instead of unregistering only xdp_rxq_info's
memory model, unregister it altogether and register it again and have
xdp_rxq_info with correct frag_size value.
Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-9-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:15:58 +0000 (20:15 +0100)]
intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
Ice and i40e ZC drivers currently set offset of a frag within
skb_shared_info to 0, which is incorrect. xdp_buffs that come from
xsk_buff_pool always have 256 bytes of a headroom, so they need to be
taken into account to retrieve xdp_buff::data via skb_frag_address().
Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from
xdp_buff::data_hard_start which would result in overwriting existing
payload.
Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:15:57 +0000 (20:15 +0100)]
ice: remove redundant xdp_rxq_info registration
xdp_rxq_info struct can be registered by drivers via two functions -
xdp_rxq_info_reg() and __xdp_rxq_info_reg(). The latter one allows
drivers that support XDP multi-buffer to set up xdp_rxq_info::frag_size
which in turn will make it possible to grow the packet via
bpf_xdp_adjust_tail() BPF helper.
Currently, ice registers xdp_rxq_info in two spots:
1) ice_setup_rx_ring() // via xdp_rxq_info_reg(), BUG
2) ice_vsi_cfg_rxq() // via __xdp_rxq_info_reg(), OK
Cited commit under fixes tag took care of setting up frag_size and
updated registration scheme in 2) but it did not help as
1) is called before 2) and as shown above it uses old registration
function. This means that 2) sees that xdp_rxq_info is already
registered and never calls __xdp_rxq_info_reg() which leaves us with
xdp_rxq_info::frag_size being set to 0.
To fix this misbehavior, simply remove xdp_rxq_info_reg() call from
ice_setup_rx_ring().
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-7-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tirthendu Sarkar [Wed, 24 Jan 2024 19:15:56 +0000 (20:15 +0100)]
i40e: handle multi-buffer packets that are shrunk by xdp prog
XDP programs can shrink packets by calling the bpf_xdp_adjust_tail()
helper function. For multi-buffer packets this may lead to reduction of
frag count stored in skb_shared_info area of the xdp_buff struct. This
results in issues with the current handling of XDP_PASS and XDP_DROP
cases.
For XDP_PASS, currently skb is being built using frag count of
xdp_buffer before it was processed by XDP prog and thus will result in
an inconsistent skb when frag count gets reduced by XDP prog. To fix
this, get correct frag count while building the skb instead of using
pre-obtained frag count.
For XDP_DROP, current page recycling logic will not reuse the page but
instead will adjust the pagecnt_bias so that the page can be freed. This
again results in inconsistent behavior as the page refcnt has already
been changed by the helper while freeing the frag(s) as part of
shrinking the packet. To fix this, only adjust pagecnt_bias for buffers
that are stillpart of the packet post-xdp prog run.
Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-6-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:15:55 +0000 (20:15 +0100)]
ice: work on pre-XDP prog frag count
Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a
multi-buffer packet by 4k bytes and then redirects it to an AF_XDP
socket.
Since support for handling multi-buffer frames was added to XDP, usage
of bpf_xdp_adjust_tail() helper within XDP program can free the page
that given fragment occupies and in turn decrease the fragment count
within skb_shared_info that is embedded in xdp_buff struct. In current
ice driver codebase, it can become problematic when page recycling logic
decides not to reuse the page. In such case, __page_frag_cache_drain()
is used with ice_rx_buf::pagecnt_bias that was not adjusted after
refcount of page was changed by XDP prog which in turn does not drain
the refcount to 0 and page is never freed.
To address this, let us store the count of frags before the XDP program
was executed on Rx ring struct. This will be used to compare with
current frag count from skb_shared_info embedded in xdp_buff. A smaller
value in the latter indicates that XDP prog freed frag(s). Then, for
given delta decrement pagecnt_bias for XDP_DROP verdict.
While at it, let us also handle the EOP frag within
ice_set_rx_bufs_act() to make our life easier, so all of the adjustments
needed to be applied against freed frags are performed in the single
place.
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-5-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:15:54 +0000 (20:15 +0100)]
xsk: fix usage of multi-buffer BPF helpers for ZC XDP
Currently when packet is shrunk via bpf_xdp_adjust_tail() and memory
type is set to MEM_TYPE_XSK_BUFF_POOL, null ptr dereference happens:
[
1136314.192256] BUG: kernel NULL pointer dereference, address:
0000000000000034
[
1136314.203943] #PF: supervisor read access in kernel mode
[
1136314.213768] #PF: error_code(0x0000) - not-present page
[
1136314.223550] PGD 0 P4D 0
[
1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI
[
1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257
[
1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT,
BIOS SE5C620.86B.02.01.0008.
031920191559 03/19/2019
[
1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210
[
1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86
[
1136314.302907] RSP: 0018:
ffffc900089f8db0 EFLAGS:
00010246
[
1136314.312967] RAX:
ffffc9003168aed0 RBX:
ffff8881c3300000 RCX:
0000000000000000
[
1136314.324953] RDX:
0000000000000000 RSI:
0000000000000003 RDI:
ffffc9003168c000
[
1136314.336929] RBP:
0000000000000ae0 R08:
0000000000000002 R09:
0000000000010000
[
1136314.348844] R10:
ffffc9000e495000 R11:
0000000000000040 R12:
0000000000000001
[
1136314.360706] R13:
0000000000000524 R14:
ffffc9003168aec0 R15:
0000000000000001
[
1136314.373298] FS:
00007f8df8bbcb80(0000) GS:
ffff8897e0e00000(0000)
knlGS:
0000000000000000
[
1136314.386105] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[
1136314.396532] CR2:
0000000000000034 CR3:
00000001aa912002 CR4:
00000000007706f0
[
1136314.408377] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[
1136314.420173] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[
1136314.431890] PKRU:
55555554
[
1136314.439143] Call Trace:
[
1136314.446058] <IRQ>
[
1136314.452465] ? __die+0x20/0x70
[
1136314.459881] ? page_fault_oops+0x15b/0x440
[
1136314.468305] ? exc_page_fault+0x6a/0x150
[
1136314.476491] ? asm_exc_page_fault+0x22/0x30
[
1136314.484927] ? __xdp_return+0x6c/0x210
[
1136314.492863] bpf_xdp_adjust_tail+0x155/0x1d0
[
1136314.501269] bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60
[
1136314.511263] ice_clean_rx_irq_zc+0x206/0xc60 [ice]
[
1136314.520222] ? ice_xmit_zc+0x6e/0x150 [ice]
[
1136314.528506] ice_napi_poll+0x467/0x670 [ice]
[
1136314.536858] ? ttwu_do_activate.constprop.0+0x8f/0x1a0
[
1136314.546010] __napi_poll+0x29/0x1b0
[
1136314.553462] net_rx_action+0x133/0x270
[
1136314.561619] __do_softirq+0xbe/0x28e
[
1136314.569303] do_softirq+0x3f/0x60
This comes from __xdp_return() call with xdp_buff argument passed as
NULL which is supposed to be consumed by xsk_buff_free() call.
To address this properly, in ZC case, a node that represents the frag
being removed has to be pulled out of xskb_list. Introduce
appropriate xsk helpers to do such node operation and use them
accordingly within bpf_xdp_adjust_tail().
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> # For the xsk header part
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-4-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:15:53 +0000 (20:15 +0100)]
xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
used by drivers to notify data path whether xdp_buff contains fragments
or not. Data path looks up mentioned flag on first buffer that occupies
the linear part of xdp_buff, so drivers only modify it there. This is
sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
stack or it resides within struct representing driver's queue and
fragments are carried via skb_frag_t structs. IOW, we are dealing with
only one xdp_buff.
ZC mode though relies on list of xdp_buff structs that is carried via
xsk_buff_pool::xskb_list, so ZC data path has to make sure that
fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
xsk_buff_free() could misbehave if it would be executed against xdp_buff
that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
take place when within supplied XDP program bpf_xdp_adjust_tail() is
used with negative offset that would in turn release the tail fragment
from multi-buffer frame.
Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
result in releasing all the nodes from xskb_list that were produced by
driver before XDP program execution, which is not what is intended -
only tail fragment should be deleted from xskb_list and then it should
be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
make it up to user space, so from AF_XDP application POV there would be
no traffic running, however due to free_list getting constantly new
nodes, driver will be able to feed HW Rx queue with recycled buffers.
Bottom line is that instead of traffic being redirected to user space,
it would be continuously dropped.
To fix this, let us clear the mentioned flag on xsk_buff_pool side
during xdp_buff initialization, which is what should have been done
right from the start of XSK multi-buffer support.
Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Maciej Fijalkowski [Wed, 24 Jan 2024 19:15:52 +0000 (20:15 +0100)]
xsk: recycle buffer in case Rx queue was full
Add missing xsk_buff_free() call when __xsk_rcv_zc() failed to produce
descriptor to XSK Rx queue.
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-2-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Kicinski [Wed, 24 Jan 2024 23:12:55 +0000 (15:12 -0800)]
Merge branch 'fix-module_description-for-net-p2'
Breno Leitao says:
====================
Fix MODULE_DESCRIPTION() for net (p2)
There are hundreds of network modules that misses MODULE_DESCRIPTION(),
causing a warnning when compiling with W=1. Example:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com90io.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/arc-rimi.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020.o
This part2 of the patchset focus on the drivers/net/ethernet drivers.
There are still some missing warnings in drivers/net/ethernet that will
be fixed in an upcoming patchset.
v1: https://lore.kernel.org/all/
20240122184543.
2501493-2-leitao@debian.org/
====================
Link: https://lore.kernel.org/r/20240123190332.677489-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao [Tue, 23 Jan 2024 19:03:31 +0000 (11:03 -0800)]
net: fill in MODULE_DESCRIPTION()s for rvu_mbox
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Marvel RVU mbox driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240123190332.677489-11-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>