linux.git
4 years agoMerge remote-tracking branch 'torvalds/master' into perf/core
Arnaldo Carvalho de Melo [Wed, 20 Jan 2021 17:35:31 +0000 (14:35 -0300)]
Merge remote-tracking branch 'torvalds/master' into perf/core

To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Add 'ping' control command
Jiri Olsa [Sat, 26 Dec 2020 23:20:38 +0000 (00:20 +0100)]
perf tools: Add 'ping' control command

Add a control 'ping' command to detect if perf is up and its control
interface is operational.

It will be used in following daemon patches to synchronize with record
session - when control interface is up and running, we know that perf
record is monitoring and ready to receive signals.

Example session:

  terminal 1:

    # mkfifo control ack
    # perf record --control=fifo:control,ack

  terminal 2:

    # echo ping > control
    # cat ack
    ack

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Add 'stop' control command
Jiri Olsa [Sat, 26 Dec 2020 23:20:37 +0000 (00:20 +0100)]
perf tools: Add 'stop' control command

Adding control 'stop' command to stop perf record.

When it is received, perf will set the 'done' variable to 1 to stop its
mmap ring buffer reading loop.

Example session:

  terminal 1:
    # mkfifo control ack
    # perf record --control=fifo:control,ack

  terminal 2:
    # echo stop > control

  terminal 1:
    [ perf record: Woken up 7 times to write data ]
    [ perf record: Captured and wrote 3.214 MB perf.data (38280 samples) ]
    #

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Add 'evlist' control command
Jiri Olsa [Sat, 26 Dec 2020 23:20:36 +0000 (00:20 +0100)]
perf tools: Add 'evlist' control command

Add a new 'evlist' control command to display all the evlist events.
When it is received, perf will scan and print current evlist into perf
record terminal.

The interface string for control file is:

  evlist [-v|-g|-F]

The syntax follows perf evlist command:
  -F  Show just the sample frequency used for each event.
  -v  Show all fields.
  -g  Show event group information.

Example session:

  terminal 1:
    # mkfifo control ack
    # perf record --control=fifo:control,ack -e '{cycles,instructions}'

  terminal 2:
    # echo evlist > control

  terminal 1:
    cycles
    instructions
    dummy:HG

  terminal 2:
    # echo 'evlist -v' > control

  terminal 1:
    cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type:            \
    IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1,    \
    sample_id_all: 1, exclude_guest: 1
    instructions: size: 120, config: 0x1, { sample_period, sample_freq }: 4000,      \
    sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, freq: 1,    \
    sample_id_all: 1, exclude_guest: 1
    dummy:HG: type: 1, size: 120, config: 0x9, { sample_period, sample_freq }: 4000, \
    sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, mmap: 1,    \
    comm: 1, freq: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, \
     bpf_event: 1

  terminal 2:
    # echo 'evlist -g' > control

  terminal 1:
    {cycles,instructions}
    dummy:HG

  terminal 2:
    # echo 'evlist -F' > control

  terminal 1:
    cycles: sample_freq=4000
    instructions: sample_freq=4000
    dummy:HG: sample_freq=4000

This new evlist command is handy to get real event names when
wildcards are used.

Adding evsel_fprintf.c object to python/perf.so build, because
it's now evlist.c dependency.

Adding PYTHON_PERF define for python/perf.so compilation, so we
can use it to compile in only evsel__fprintf from evsel_fprintf.c
object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf tools: Allow to enable/disable events via control file
Jiri Olsa [Sat, 26 Dec 2020 23:20:35 +0000 (00:20 +0100)]
perf tools: Allow to enable/disable events via control file

Adding new control events to enable/disable specific event.
The interface string for control file are:

  'enable <EVENT NAME>'
  'disable <EVENT NAME>'

when received the command, perf will scan the current evlist
for <EVENT NAME> and if found it's enabled/disabled.

Example session:

  terminal 1:
    # mkfifo control ack perf.pipe
    # perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:*' -o - > perf.pipe

  terminal 2:
    # cat perf.pipe | perf --no-pager script -i -

  terminal 1:
    Events disabled

  NOTE Above message will show only after read side of the pipe ('>')
  is started on 'terminal 2'. The 'terminal 1's bash does not execute
  perf before that, hence the delyaed perf record message.

  terminal 3:
    # echo 'enable sched:sched_process_fork' > control

  terminal 1:
    event sched:sched_process_fork enabled

  terminal 2:
    bash 33349 [034] 149587.674295: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34056
    bash 33349 [034] 149588.239521: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34057

  terminal 3:
    # echo 'enable sched:sched_wakeup_new' > control

  terminal 1:
    event sched:sched_wakeup_new enabled

  terminal 2:
    bash 33349 [034] 149632.228023: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34059
    bash 33349 [034] 149632.228050:   sched:sched_wakeup_new: bash:34059 [120] success=1 CPU:036
    bash 33349 [034] 149633.950005: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34060
    bash 33349 [034] 149633.950030:   sched:sched_wakeup_new: bash:34060 [120] success=1 CPU:036

Committer testing:

If I use 'sched:*' and then enable all events, I can't get 'perf record'
to react to further commands, so I tested it with:

  [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
  Events disabled
  Events enabled
  Events disabled

And then it works as expected, so we need to fix this pre-existing
problem.

Another issue, we need to check if a event is already enabled or
disabled and change the message to be clearer, i.e.:

  [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
  Events disabled

If we receive a 'disable' command, then it should say:

  [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
  Events disabled
  Events already disabled

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf config: Make perf_config_global() global
Jiri Olsa [Sat, 2 Jan 2021 22:04:25 +0000 (23:04 +0100)]
perf config: Make perf_config_global() global

Make perf_config_global global, it will be used outside the config.c
object in the following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf config: Make perf_config_system() global
Jiri Olsa [Sat, 2 Jan 2021 22:04:24 +0000 (23:04 +0100)]
perf config: Make perf_config_system() global

Make perf_config_system global, it will be used outside the config.c
object in the following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf config: Add perf_home_perfconfig function
Jiri Olsa [Sat, 2 Jan 2021 22:04:23 +0000 (23:04 +0100)]
perf config: Add perf_home_perfconfig function

Factor out the perf_home_perfconfig, that looks for .perfconfig in home
directory including check for PERF_CONFIG_NOGLOBAL and for proper
permission.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf debug: Add debug_set_display_time function
Jiri Olsa [Sat, 2 Jan 2021 22:04:22 +0000 (23:04 +0100)]
perf debug: Add debug_set_display_time function

Allow to display time in perf debug output via new
debug_set_display_time function.

It will be used in perf daemon command to get verbose output into log
file.

The debug time format is:

  [2020-12-03 18:25:31.822152] affinity: SYS
  [2020-12-03 18:25:31.822164] mmap flush: 1
  [2020-12-03 18:25:31.822175] comp level: 0
  [2020-12-03 18:25:32.002047] mmap size 528384B

Committer notes:

Cast tod.tv_usec to long to avoid this problem:

    78    12.70 ubuntu:18.04-x-sparc64        : FAIL sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

  util/debug.c: In function 'fprintf_time':
  util/debug.c:63:32: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type '__suseconds_t {aka int}' [-Werror=format=]
    return fprintf(file, "[%s.%06lu] ", date, tod.tv_usec);
                              ~~~~^           ~~~~~~~~~~~
                              %06u

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf config: Add config set interface
Jiri Olsa [Sat, 2 Jan 2021 22:04:21 +0000 (23:04 +0100)]
perf config: Add config set interface

Add interface to load config set from custom file by using
perf_config_set__load_file function.

It will be used in perf daemon command to process custom config file.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf config: Make perf_config_from_file() static
Jiri Olsa [Sat, 2 Jan 2021 22:04:20 +0000 (23:04 +0100)]
perf config: Make perf_config_from_file() static

It's not used outside config.c object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf test: Add test case for PERF_SAMPLE_CODE_PAGE_SIZE
Stephane Eranian [Tue, 5 Jan 2021 19:57:52 +0000 (11:57 -0800)]
perf test: Add test case for PERF_SAMPLE_CODE_PAGE_SIZE

Extend sample-parsing test cases to support new sample type
PERF_SAMPLE_CODE_PAGE_SIZE.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-7-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Stephane Eranian [Tue, 5 Jan 2021 19:57:51 +0000 (11:57 -0800)]
perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE

Add a new sort dimension "code_page_size" for common sort.
With this option applied, perf can sort and report by sample's code page
size.

For example:

  # perf report --stdio --sort=comm,symbol,code_page_size
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 3K of event 'mem-loads:uP'
  # Event count (approx.): 1470769
  #
  # Overhead  Command  Symbol                        Code Page Size IPC [IPC Coverage]
  # ........  .......  ............................  .............. ....................
  #
      69.56%  dtlb     [.] GetTickCount              4K              -   -
      17.93%  dtlb     [.] Calibrate                 4K              -   -
      11.40%  dtlb     [.] __gettimeofday            4K              -   -
  #

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-6-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Stephane Eranian [Tue, 5 Jan 2021 19:57:50 +0000 (11:57 -0800)]
perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE

Display sampled code page sizes when PERF_SAMPLE_CODE_PAGE_SIZE was set.

For example:

  # perf script --fields comm,event,ip,code_page_size
            dtlb mem-loads:uP:            445777 4K
            dtlb mem-loads:uP:            40f724 4K
            dtlb mem-loads:uP:            474926 4K
            dtlb mem-loads:uP:            401075 4K
            dtlb mem-loads:uP:            401095 4K
            dtlb mem-loads:uP:            401095 4K
            dtlb mem-loads:uP:            4010cc 4K
            dtlb mem-loads:uP:            440b6f 4K
  #

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-5-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf record: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Kan Liang [Tue, 5 Jan 2021 19:57:49 +0000 (11:57 -0800)]
perf record: Add support for PERF_SAMPLE_CODE_PAGE_SIZE

Adds the infrastructure to sample the code address page size.

Introduce a new --code-page-size option for perf record.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Originally-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf mem: Support data page size
Kan Liang [Tue, 5 Jan 2021 19:57:48 +0000 (11:57 -0800)]
perf mem: Support data page size

Add option --data-page-size in "perf mem" to record/report data page
size.

Here are some examples:

  # perf mem --phys-data --data-page-size report -D
  # PID, TID, IP, ADDR, PHYS ADDR, DATA PAGE SIZE, LOCAL WEIGHT, DSRC, SYMBOL
  20134 20134 0xffffffffb5bd2fd0 0x016ffff9a274e96a308 0x000000044e96a308 4K  1168 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:perf_ctx_unlock
  20134 20134 0xffffffffb63f645c 0xffffffffb752b814 0xcfb52b814 2M 225 0x26a100142 /lib/modules/4.18.0-rc7+/build/vmlinux:_raw_spin_lock
  20134 20134 0xffffffffb660300c 0xfffffe00016b8bb0 0x0 4K 0 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:__x86_indirect_thunk_rax
  #

  # perf mem --phys-data --data-page-size report --stdio
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 5K of event 'cpu/mem-loads,ldlat=30/P'
  # Total weight : 281234
  # Sort order   :
  # mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked,phys_daddr,data_page_size
  #
  # Overhead  Samples  Memory access  Symbol                        Shared Object     Data Symbol             Data Object  TLB access    Locked  Data Physical Address   Data Page Size
  # ........  .......  .............  ............................  ................  ......................  ...........  ............  ......  ......................  ..............

    28.54%     1826    L1 or L1 hit   [k] __x86_indirect_thunk_rax  [kernel.vmlinux]  [k] 0xffffb0df31b0ff28  [unknown]    L1 or L2 hit  No      [k] 0x0000000000000000  4K
     6.02%      256    L1 or L1 hit   [.] touch_buffer              dtlb              [.] 0x00007ffd50109da8  [stack]      L1 or L2 hit  No      [.] 0x000000042454ada8  4K
     3.23%        5    L1 or L1 hit   [k] clear_huge_page           [kernel.vmlinux]  [k] 0xffff9a2753b8ce60  [unknown]    L1 or L2 hit  No      [k] 0x0000000453b8ce60  2M
     2.98%        4    L1 or L1 hit   [k] clear_page_erms           [kernel.vmlinux]  [k] 0xffffb0df31b0fd00  [unknown]    L1 or L2 hit  No      [k] 0x0000000000000000  4K

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf mem: Clean up output format
Kan Liang [Tue, 5 Jan 2021 19:57:47 +0000 (11:57 -0800)]
perf mem: Clean up output format

Now, "--phys-data" is the only option which impacts the output format.

A simple "if else" is enough to handle the option. But there will be
more options added, e.g. "--data-page-size", which also impact the
output format. The code will become too complex to be maintained.

Divide the big printf into several small pieces. Output the specific
piece only if the related option is applied.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf cs-etm: Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0
James Clark [Fri, 8 Jan 2021 14:27:52 +0000 (16:27 +0200)]
perf cs-etm: Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0

Replace the OCSD_INSTR switch statement with an if to fix compilation
error about unhandled values and avoid this issue again in the future.

Add new OCSD_GEN_TRC_ELEM_SYNC_MARKER and OCSD_GEN_TRC_ELEM_MEMTRANS
enum values to fix unhandled value compilation error. Currently they are
ignored.

Increase the minimum version number to v1.0.0 now that new enum values
are used that are only present in this version.

Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210108142752.27872-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf c2c: Add local variables for output metrics
Leo Yan [Thu, 14 Jan 2021 15:46:46 +0000 (23:46 +0800)]
perf c2c: Add local variables for output metrics

This patch adds several local variables:

  "cl_output": pointer for outputting single cache line metrics;
  "output_str": pointer for outputting cache line metrics;
  "sort_str": pointer to the sorting metrics.

This can improve readability for the code and it's more flexible when
later extend to different strings for the output metrics.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-7-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf c2c: Refactor node display
Leo Yan [Thu, 14 Jan 2021 15:46:45 +0000 (23:46 +0800)]
perf c2c: Refactor node display

The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
by every node and it's shown for the node info.

This patch introduces the static function display_metrics() to replace
the macro, and the parameters are refined for passing the metric's
statistic and sum value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf c2c: Fix argument type for percent()
Leo Yan [Thu, 14 Jan 2021 15:46:44 +0000 (23:46 +0800)]
perf c2c: Fix argument type for percent()

For percent() its arguments are defined as integers; this is not
consistent with its consumers which pass u32 arguments.

Thus this patch makes argument type as u32 for percent().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf c2c: Refactor display filter
Leo Yan [Thu, 14 Jan 2021 15:46:43 +0000 (23:46 +0800)]
perf c2c: Refactor display filter

When sorting on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
the FILTER_HITM macro is used to filter out the cache line entries if
its overhead is less than 1%.

This patch introduces a static function filter_display() to replace that
macro and refines its parameters with a more flexible way, rather than
passing field name, it changes to pass the cache line's statistic and
sum value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf c2c: Refactor hist entry validation
Leo Yan [Thu, 14 Jan 2021 15:46:42 +0000 (23:46 +0800)]
perf c2c: Refactor hist entry validation

This patch has no functionality changes but refactors hist entry
validation for cache line resorting.

It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()",
changes return type from integer type to bool type.  In the function,
it uses switch-case instead of ternary operators, which is easier
to extend for more display types.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf c2c: Rename for shared cache line stats
Leo Yan [Thu, 14 Jan 2021 15:46:41 +0000 (23:46 +0800)]
perf c2c: Rename for shared cache line stats

For shared cache line statistics, 'perf c2c' relies on HITM.  We can use
more general naming rather than only binding to HITM, so replace
"hitm_stats" with "shared_clines_stats" in structure perf_c2c, and
rename function resort_hitm_cb() to resort_shared_cl_cb().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20210114154646.209024-2-leo.yan@linaro.org
4 years agoperf stat: Enable counting events for BPF programs
Song Liu [Tue, 29 Dec 2020 21:42:14 +0000 (13:42 -0800)]
perf stat: Enable counting events for BPF programs

Introduce 'perf stat -b' option, which counts events for BPF programs, like:

  [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
     1.487903822            115,200      ref-cycles
     1.487903822             86,012      cycles
     2.489147029             80,560      ref-cycles
     2.489147029             73,784      cycles
     3.490341825             60,720      ref-cycles
     3.490341825             37,797      cycles
     4.491540887             37,120      ref-cycles
     4.491540887             31,963      cycles

The example above counts 'cycles' and 'ref-cycles' of BPF program of id
254.  This is similar to bpftool-prog-profile command, but more
flexible.

'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps.

A new 'struct bpf_counter' is introduced to provide a common interface
that uses BPF programs/maps to count perf events.

Committer notes:

Removed all but bpf_counter.h includes from evsel.h, not needed at all.

Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
buffer passed to the kernel libbpf_num_possible_cpus() entries, not
evsel__nr_cpus(evsel), as the former uses
/sys/devices/system/cpu/possible while the later uses
/sys/devices/system/cpu/online, which may be less than the 'possible'
number making the bpf map lookup overwrite memory and cause hard to
debug memory corruption.

We need to continue using evsel__nr_cpus(evsel) when accessing the
perf_counts array tho, not to overwrite another are of memory :-)

Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMerge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block
Linus Torvalds [Tue, 19 Jan 2021 21:26:05 +0000 (13:26 -0800)]
Merge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block

Pull task_work fix from Jens Axboe:
 "The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
  task_work run we had in get_signal().

  This caused a regression for some setups, since we're relying on eg
  ____fput() being run to close and release, for example, a pipe and
  wake the other end.

  For 5.11, I prefer the simple solution of just reinstating the
  unconditional run, even if it conceptually doesn't make much sense -
  if you need that kind of guarantee, you should be using TWA_SIGNAL
  instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
  ensure that other potential gotchas/assumptions for task_work don't
  regress for 5.11.

  We're looking into further simplifying the task_work notifications for
  5.12 which would resolve that too"

* tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
  task_work: unconditionally run task_work from get_signal()

4 years agoMerge tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Tue, 19 Jan 2021 21:01:50 +0000 (13:01 -0800)]
Merge tag 'nfsd-5.11-2' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Avoid exposing parent of root directory in NFSv3 READDIRPLUS results

 - Fix a tracepoint change that went in the initial 5.11 merge

* tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Move the svc_xdr_recvfrom tracepoint again
  nfsd4: readdirplus shouldn't return parent of export

4 years agoMerge tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 19 Jan 2021 20:58:55 +0000 (12:58 -0800)]
Merge tag 'hyperv-fixes-signed-20210119' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fix from Wei Liu:
 "One patch from Dexuan to fix clockevent initialization"

* tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Initialize clockevents after LAPIC is initialized

4 years agoMerge tag 'spi-fix-v5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Mon, 18 Jan 2021 19:23:05 +0000 (11:23 -0800)]
Merge tag 'spi-fix-v5.11-rc4' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few more bug fixes for SPI, both driver specific ones. The caching
  in the Cadence driver is to avoid a deadlock trying to retrieve the
  cached value later at runtime"

* tag 'spi-fix-v5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cadence: cache reference clock rate during probe
  spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode

4 years agoMerge tag 'fixes-2021-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Mon, 18 Jan 2021 19:17:18 +0000 (11:17 -0800)]
Merge tag 'fixes-2021-01-18' of git://git./linux/kernel/git/rppt/memblock

Pull ia64 build fix from Mike Rapoport:
 "Fix an ia64 build failure caused by memory model changes"

* tag 'fixes-2021-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  ia64: fix build failure caused by memory model changes

4 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Mon, 18 Jan 2021 19:07:18 +0000 (11:07 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "A Kconfig dependency issue with omap-sham and a divide by zero in xor
  on some platforms"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: omap-sham - Fix link error without crypto-engine
  crypto: xor - Fix divide error in do_xor_speed()

4 years agoLinux 5.11-rc4
Linus Torvalds [Mon, 18 Jan 2021 00:37:05 +0000 (16:37 -0800)]
Linux 5.11-rc4

4 years agoMerge tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Jan 2021 21:14:46 +0000 (13:14 -0800)]
Merge tag 'perf-tools-fixes-2021-01-17' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix 'CPU too large' error in Intel PT

 - Correct event attribute sizes in 'perf inject'

 - Sync build_bug.h and kvm.h kernel copies

 - Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example

 - libbpf tests fixes

 - Fix shadow stat 'perf test' for non-bash shells

 - Take cgroups into account for shadow stats in 'perf stat'

* tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf inject: Correct event attribute sizes
  perf intel-pt: Fix 'CPU too large' error
  perf stat: Take cgroups into account for shadow stats
  perf stat: Introduce struct runtime_stat_data
  libperf tests: Fail when failing to get a tracepoint id
  libperf tests: If a test fails return non-zero
  libperf tests: Avoid uninitialized variable warning
  perf test: Fix shadow stat test for non-bash shells
  tools headers: Syncronize linux/build_bug.h with the kernel sources
  tools headers UAPI: Sync kvm.h headers with the kernel sources
  perf bpf examples: Fix bpf.h header include directive in 5sec.c example

4 years agoMerge tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 17 Jan 2021 20:28:58 +0000 (12:28 -0800)]
Merge tag 'powerpc-5.11-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One fix for a lack of alignment in our linker script, that can lead to
  crashes depending on configuration etc.

  One fix for the 32-bit VDSO after the C VDSO conversion.

  Thanks to Andreas Schwab, Ariel Marcovitch, and Christophe Leroy"

* tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/vdso: Fix clock_gettime_fallback for vdso32
  powerpc: Fix alignment bug within the init sections

4 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 17 Jan 2021 20:16:47 +0000 (12:16 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs

Pull misc vfs fixes from Al Viro:
 "Several assorted fixes.

  I still think that audit ->d_name race is better fixed this way for
  the benefit of backports, with any possibly fancier variants done on
  top of it"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dump_common_audit_data(): fix racy accesses to ->d_name
  iov_iter: fix the uaccess area in copy_compat_iovec_from_user
  umount(2): move the flag validity checks first

4 years agomm: don't put pinned pages into the swap cache
Linus Torvalds [Sat, 16 Jan 2021 23:34:57 +0000 (15:34 -0800)]
mm: don't put pinned pages into the swap cache

So technically there is nothing wrong with adding a pinned page to the
swap cache, but the pinning obviously means that the page can't actually
be free'd right now anyway, so it's a bit pointless.

However, the real problem is not with it being a bit pointless: the real
issue is that after we've added it to the swap cache, we'll try to unmap
the page.  That will succeed, because the code in mm/rmap.c doesn't know
or care about pinned pages.

Even the unmapping isn't fatal per se, since the page will stay around
in memory due to the pinning, and we do hold the connection to it using
the swap cache.  But when we then touch it next and take a page fault,
the logic in do_swap_page() will map it back into the process as a
possibly read-only page, and we'll then break the page association on
the next COW fault.

Honestly, this issue could have been fixed in any of those other places:
(a) we could refuse to unmap a pinned page (which makes conceptual
sense), or (b) we could make sure to re-map a pinned page writably in
do_swap_page(), or (c) we could just make do_wp_page() not COW the
pinned page (which was what we historically did before that "mm:
do_wp_page() simplification" commit).

But while all of them are equally valid models for breaking this chain,
not putting pinned pages into the swap cache in the first place is the
simplest one by far.

It's also the safest one: the reason why do_wp_page() was changed in the
first place was that getting the "can I re-use this page" wrong is so
fraught with errors.  If you do it wrong, you end up with an incorrectly
shared page.

As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
do_swap_page() would be a serious bug since it is only a (very good)
heuristic.  Re-using the page requires a hard black-and-white rule with
no room for ambiguity.

In contrast, saying "this page is very likely dma pinned, so let's not
add it to the swap cache and try to unmap it" is an obviously safe thing
to do, and if the heuristic might very rarely be a false positive, no
harm is done.

Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Reported-and-tested-by: Martin Raiber <martin@urbackup.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agox86/hyperv: Initialize clockevents after LAPIC is initialized
Dexuan Cui [Sat, 16 Jan 2021 22:31:36 +0000 (14:31 -0800)]
x86/hyperv: Initialize clockevents after LAPIC is initialized

With commit 4df4cb9e99f8, the Hyper-V direct-mode STIMER is actually
initialized before LAPIC is initialized: see

  apic_intr_mode_init()

    x86_platform.apic_post_init()
      hyperv_init()
        hv_stimer_alloc()

    apic_bsp_setup()
      setup_local_APIC()

setup_local_APIC() temporarily disables LAPIC, initializes it and
re-eanble it.  The direct-mode STIMER depends on LAPIC, and when it's
registered, it can be programmed immediately and the timer can fire
very soon:

  hv_stimer_init
    clockevents_config_and_register
      clockevents_register_device
        tick_check_new_device
          tick_setup_device
            tick_setup_periodic(), tick_setup_oneshot()
              clockevents_program_event

When the timer fires in the hypervisor, if the LAPIC is in the
disabled state, new versions of Hyper-V ignore the event and don't inject
the timer interrupt into the VM, and hence the VM hangs when it boots.

Note: when the VM starts/reboots, the LAPIC is pre-enabled by the
firmware, so the window of LAPIC being temporarily disabled is pretty
small, and the issue can only happen once out of 100~200 reboots for
a 40-vCPU VM on one dev host, and on another host the issue doesn't
reproduce after 2000 reboots.

The issue is more noticeable for kdump/kexec, because the LAPIC is
disabled by the first kernel, and stays disabled until the kdump/kexec
kernel enables it. This is especially an issue to a Generation-2 VM
(for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000
(rather than CONFIG_HZ=250) is used.

Fix the issue by moving hv_stimer_alloc() to a later place where the
LAPIC timer is initialized.

Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
4 years agoia64: fix build failure caused by memory model changes
Mike Rapoport [Fri, 18 Dec 2020 16:35:50 +0000 (18:35 +0200)]
ia64: fix build failure caused by memory model changes

The change of ia64's default memory model to SPARSEMEM causes defconfig
build to fail:

  CC      kernel/async.o
In file included from include/linux/numa.h:25,
                 from include/linux/async.h:13,
                 from kernel/async.c:47:
arch/ia64/include/asm/sparsemem.h:14:40: warning: "PAGE_SHIFT" is not defined, evaluates to 0 [-Wundef]
   14 | #if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
      |                                        ^~~~~~~~~~
In file included from include/linux/gfp.h:6,
                 from include/linux/xarray.h:14,
                 from include/linux/radix-tree.h:19,
                 from include/linux/idr.h:15,
                 from include/linux/kernfs.h:13,
                 from include/linux/sysfs.h:16,
                 from include/linux/kobject.h:20,
                 from include/linux/energy_model.h:7,
                 from include/linux/device.h:16,
                 from include/linux/async.h:14,
                 from kernel/async.c:47:
include/linux/mmzone.h:1156:2: error: #error Allocator MAX_ORDER exceeds SECTION_SIZE
 1156 | #error Allocator MAX_ORDER exceeds SECTION_SIZE
      |  ^~~~~

The error cause is the missing definition of PAGE_SHIFT in the calculation
of SECTION_SIZE_BITS.

Add include of <asm/page.h> to arch/ia64/include/asm/sparsemem.h to solve
the problem.

Fixes: 214496cb1870 ("ia64: make SPARSEMEM default and disable DISCONTIGMEM")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 16 Jan 2021 20:25:40 +0000 (12:25 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Nine minor fixes, seven in drivers and two in the core SCSI disk
  driver (sd) which should be harmless involving removing an unused
  variable and quietening a spurious warning"

Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Remove obsolete variable in sd_remove()
  scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
  scsi: scsi_debug: Fix memleak in scsi_debug_init()
  scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility"
  scsi: qedi: Correct max length of CHAP secret
  scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
  scsi: ufs: Relocate flush of exceptional event
  scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
  scsi: ufs: Fix possible power drain during system suspend

4 years agodump_common_audit_data(): fix racy accesses to ->d_name
Al Viro [Tue, 5 Jan 2021 19:43:46 +0000 (14:43 -0500)]
dump_common_audit_data(): fix racy accesses to ->d_name

We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us.  And it's possible for
old long name to be freed after rename, leading to UAF here.

Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 years agoMerge tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 16 Jan 2021 19:39:58 +0000 (11:39 -0800)]
Merge tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just an nvme pull request via Christoph:

   - don't initialize hwmon for discover controllers (Sagi Grimberg)

   - fix iov_iter handling in nvme-tcp (Sagi Grimberg)

   - fix a preempt warning in nvme-tcp (Sagi Grimberg)

   - fix a possible NULL pointer dereference in nvme (Israel Rukshin)"

* tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
  nvme: don't intialize hwmon for discovery controllers
  nvme-tcp: fix possible data corruption with bio merges
  nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
  nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY

4 years agoMerge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 16 Jan 2021 19:12:02 +0000 (11:12 -0800)]
Merge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "We still have a pending fix for a cancelation issue, but it's still
  being investigated. In the meantime:

   - Dead mm handling fix (Pavel)

   - SQPOLL setup error handling (Pavel)

   - Flush timeout sequence fix (Marcelo)

   - Missing finish_wait() for one exit case"

* tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
  io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
  io_uring: flush timeouts that should already have expired
  io_uring: do sqo disable on install_fd error
  io_uring: fix null-deref in io_disable_sqo_submit
  io_uring: don't take files/mm for a dead task
  io_uring: drop mm and files after task_work_run

4 years agoMerge tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 16 Jan 2021 19:00:08 +0000 (11:00 -0800)]
Merge tag 'riscv-for-linus-5.11-rc4' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "There are a few more fixes than a normal rc4, largely due to the
  bubble introduced by the holiday break:

   - return -ENOSYS for syscall number -1, which previously returned an
     uninitialized value.

   - ensure of_clk_init() has been called in time_init(), without which
     clock drivers may not be initialized.

   - fix sifive,uart0 driver to properly display the baud rate. A fix to
     initialize MPIE that allows interrupts to be processed during
     system calls.

   - avoid erronously begin tracing IRQs when interrupts are disabled,
     which at least triggers suprious lockdep failures.

   - workaround for a warning related to calling smp_processor_id()
     while preemptible. The warning itself is suprious on currently
     availiable systems.

   - properly include the generic time VDSO calls. A fix to our kasan
     address mapping. A fix to the HiFive Unleashed device tree, which
     allows the Ethernet PHY to be properly initialized by Linux (as
     opposed to relying on the bootloader).

   - defconfig update to include SiFive's GPIO driver, which is present
     on the HiFive Unleashed and necessary to initialize the PHY.

   - avoid allocating memory while initializing reserved memory.

   - avoid allocating the last 4K of memory, as pointers there alias
     with syscall errors.

  There are also two cleanups that should have no functional effect but
  do fix build warnings:

   - drop a duplicated definition of PAGE_KERNEL_EXEC.

   - properly declare the asm register SP shim.

   - cleanup the rv32 memory size Kconfig entry, to reflect the actual
     size of memory availiable"

* tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix maximum allowed phsyical memory for RV32
  RISC-V: Set current memblock limit
  RISC-V: Do not allocate memblock while iterating reserved memblocks
  riscv: stacktrace: Move register keyword to beginning of declaration
  riscv: defconfig: enable gpio support for HiFive Unleashed
  dts: phy: add GPIO number and active state used for phy reset
  dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
  riscv: Fix KASAN memory mapping.
  riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
  riscv: cacheinfo: Fix using smp_processor_id() in preemptible
  riscv: Trace irq on only interrupt is enabled
  riscv: Drop a duplicated PAGE_KERNEL_EXEC
  riscv: Enable interrupts during syscalls with M-Mode
  riscv: Fix sifive serial driver
  riscv: Fix kernel time_init()
  riscv: return -ENOSYS for syscall -1

4 years agomm: don't play games with pinned pages in clear_page_refs
Linus Torvalds [Sun, 10 Jan 2021 01:09:10 +0000 (17:09 -0800)]
mm: don't play games with pinned pages in clear_page_refs

Turning a pinned page read-only breaks the pinning after COW.  Don't do it.

The whole "track page soft dirty" state doesn't work with pinned pages
anyway, since the page might be dirtied by the pinning entity without
ever being noticed in the page tables.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: fix clear_refs_write locking
Linus Torvalds [Fri, 8 Jan 2021 21:13:41 +0000 (13:13 -0800)]
mm: fix clear_refs_write locking

Turning page table entries read-only requires the mmap_sem held for
writing.

So stop doing the odd games with turning things from read locks to write
locks and back.  Just get the write lock.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoRISC-V: Fix maximum allowed phsyical memory for RV32
Atish Patra [Mon, 11 Jan 2021 23:45:04 +0000 (15:45 -0800)]
RISC-V: Fix maximum allowed phsyical memory for RV32

Linux kernel can only map 1GB of address space for RV32 as the page offset
is set to 0xC0000000. The current description in the Kconfig is confusing
as it indicates that RV32 can support 2GB of physical memory. That is
simply not true for current kernel. In future, a 2GB split support can be
added to allow 2GB physical address space.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
4 years agoRISC-V: Set current memblock limit
Atish Patra [Mon, 11 Jan 2021 23:45:02 +0000 (15:45 -0800)]
RISC-V: Set current memblock limit

Currently, linux kernel can not use last 4k bytes of addressable space
because IS_ERR_VALUE macro treats those as an error. This will be an issue
for RV32 as any memblock allocator potentially allocate chunk of memory
from the end of DRAM (2GB) leading bad address error even though the
address was technically valid.

Fix this issue by limiting the memblock if available memory spans the
entire address space.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
4 years agoRISC-V: Do not allocate memblock while iterating reserved memblocks
Atish Patra [Mon, 11 Jan 2021 23:45:01 +0000 (15:45 -0800)]
RISC-V: Do not allocate memblock while iterating reserved memblocks

Currently, resource tree allocates memory blocks while iterating on the
list. It leads to following kernel warning because memblock allocation
also invokes memory block reservation API.

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:795
__insert_resource+0x8e/0xd0
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-00022-ge20097fb37e2-dirty #549
[    0.000000] epc: c00125c2 ra : c001262c sp : c1c01f50
[    0.000000]  gp : c1d456e0 tp : c1c0a980 t0 : ffffcf20
[    0.000000]  t1 : 00000000 t2 : 00000000 s0 : c1c01f60
[    0.000000]  s1 : ffffcf00 a0 : ffffff00 a1 : c1c0c0c4
[    0.000000]  a2 : 80c12b15 a3 : 80402000 a4 : 80402000
[    0.000000]  a5 : c1c0c0c4 a6 : 80c12b15 a7 : f5faf600
[    0.000000]  s2 : c1c0c0c4 s3 : c1c0e000 s4 : c1009a80
[    0.000000]  s5 : c1c0c000 s6 : c1d48000 s7 : c1613b4c
[    0.000000]  s8 : 00000fff s9 : 80000200 s10: c1613b40
[    0.000000]  s11: 00000000 t3 : c1d4a000 t4 : ffffffff

This is also unnecessary as we can pre-compute the total memblocks required
for each memory region and allocate it before the loop. It save precious
boot time not going through memblock allocation code every time.

Fixes: 00ab027a3b82 ("RISC-V: Add kernel image sections to the resource tree")
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
4 years agoiov_iter: fix the uaccess area in copy_compat_iovec_from_user
Christoph Hellwig [Mon, 11 Jan 2021 17:19:26 +0000 (18:19 +0100)]
iov_iter: fix the uaccess area in copy_compat_iovec_from_user

sizeof needs to be called on the compat pointer, not the native one.

Fixes: 89cd35c58bc2 ("iov_iter: transparently handle compat iovecs in import_iovec")
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 years agoMerge tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 16 Jan 2021 02:01:17 +0000 (18:01 -0800)]
Merge tag 'for-5.11/dm-fixes-1' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM-raid's raid1 discard limits so discards work.

 - Select missing Kconfig dependencies for DM integrity and zoned
   targets.

 - Four fixes for DM crypt target's support to optionally bypass kcryptd
   workqueues.

 - Fix DM snapshot merge supports missing data flushes before committing
   metadata.

 - Fix DM integrity data device flushing when external metadata is used.

 - Fix DM integrity's maximum number of supported constructor arguments
   that user can request when creating an integrity device.

 - Eliminate DM core ioctl logging noise when an ioctl is issued without
   required CAP_SYS_RAWIO permission.

* tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm crypt: defer decryption to a tasklet if interrupts disabled
  dm integrity: fix the maximum number of arguments
  dm crypt: do not call bio_endio() from the dm-crypt tasklet
  dm integrity: fix flush with external metadata device
  dm: eliminate potential source of excessive kernel log noise
  dm snapshot: flush merged data before committing metadata
  dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
  dm crypt: do not wait for backlogged crypto request completion in softirq
  dm zoned: select CONFIG_CRC32
  dm integrity: select CRYPTO_SKCIPHER
  dm raid: fix discard limits for raid1

4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 15 Jan 2021 23:25:45 +0000 (15:25 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "10 patches.

  Subsystems affected by this patch series: MAINTAINERS and mm (slub,
  pagealloc, memcg, kasan, vmalloc, migration, hugetlb, memory-failure,
  and process_vm_access)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/process_vm_access.c: include compat.h
  mm,hwpoison: fix printing of page flags
  MAINTAINERS: add Vlastimil as slab allocators maintainer
  mm/hugetlb: fix potential missing huge page size info
  mm: migrate: initialize err in do_migrate_pages
  mm/vmalloc.c: fix potential memory leak
  arm/kasan: fix the array size of kasan_early_shadow_pte[]
  mm/memcontrol: fix warning in mem_cgroup_page_lruvec()
  mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
  mm, slub: consider rest of partial list if acquire_slab() fails

4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Fri, 15 Jan 2021 23:14:40 +0000 (15:14 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "A fairly modest set of bug fixes, nothing abnormal from the merge
  window

  The ucma patch is a bit on the larger side, but given the regression
  was recently added I've opted to forward it to the rc stream.

   - Fix a ucma memory leak introduced in v5.9 while fixing the
     Syzkaller bugs

   - Don't fail when the xarray wraps for user verbs objects

   - User triggerable oops regression from the umem page size rework

   - Error unwind bugs in usnic, ocrdma, mlx5 and cma"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/cma: Fix error flow in default_roce_mode_store
  RDMA/mlx5: Fix wrong free of blue flame register on error
  IB/mlx5: Fix error unwinding when set_has_smi_cap fails
  RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()
  RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()
  RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
  RDMA/restrack: Don't treat as an error allocation ID wrapping
  RDMA/ucma: Do not miss ctx destruction steps in some cases

4 years agoio_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
Jens Axboe [Fri, 15 Jan 2021 23:04:23 +0000 (16:04 -0700)]
io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()

If we enter with requests pending and performm cancelations, we'll have
a different inflight count before and after calling prepare_to_wait().
This causes the loop to restart. If we actually ended up canceling
everything, or everything completed in-between, then we'll break out
of the loop without calling finish_wait() on the waitqueue. This can
trigger a warning on exit_signals(), as we leave the task state in
TASK_UNINTERRUPTIBLE.

Put a finish_wait() after the loop to catch that case.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 15 Jan 2021 22:54:24 +0000 (14:54 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "A number of bug fixes for ext4:

   - Fix for the new fast_commit feature

   - Fix some error handling codepaths in whiteout handling and
     mountpoint sampling

   - Fix how we write ext4_error information so it goes through the
     journal when journalling is active, to avoid races that can lead to
     lost error information, superblock checksum failures, or DIF/DIX
     features"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: remove expensive flush on fast commit
  ext4: fix bug for rename with RENAME_WHITEOUT
  ext4: fix wrong list_splice in ext4_fc_cleanup
  ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
  ext4: don't leak old mountpoint samples
  ext4: drop ext4_handle_dirty_super()
  ext4: fix superblock checksum failure when setting password salt
  ext4: use sbi instead of EXT4_SB(sb) in ext4_update_super()
  ext4: save error info to sb through journal if available
  ext4: protect superblock modifications with a buffer lock
  ext4: drop sync argument of ext4_commit_super()
  ext4: combine ext4_handle_error() and save_error_info()

4 years agoMerge tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 15 Jan 2021 22:39:21 +0000 (14:39 -0800)]
Merge tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Two small cifs fixes for stable (including an important handle leak
  fix) and three small cleanup patches"

* tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: style: replace one-element array with flexible-array
  cifs: connect: style: Simplify bool comparison
  fs: cifs: remove unneeded variable in smb3_fs_context_dup
  cifs: fix interrupted close commands
  cifs: check pointer before freeing

4 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 15 Jan 2021 21:11:51 +0000 (13:11 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Set the minimum GCC version to 5.1 for arm64 due to earlier compiler
   bugs.

 - Make atomic helpers __always_inline to avoid a section mismatch when
   compiling with clang.

 - Fix the CMA and crashkernel reservations to use ZONE_DMA (remove the
   arm64_dma32_phys_limit variable, no longer needed with a dynamic
   ZONE_DMA sizing in 5.11).

 - Remove redundant IRQ flag tracing that was leaving lockdep
   inconsistent with the hardware state.

 - Revert perf events based hard lockup detector that was causing
   smp_processor_id() to be called in preemptible context.

 - Some trivial cleanups - spelling fix, renaming S_FRAME_SIZE to
   PT_REGS_SIZE, function prototypes added.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: selftests: Fix spelling of 'Mismatch'
  arm64: syscall: include prototype for EL0 SVC functions
  compiler.h: Raise minimum version of GCC to 5.1 for arm64
  arm64: make atomic helpers __always_inline
  arm64: rename S_FRAME_SIZE to PT_REGS_SIZE
  Revert "arm64: Enable perf events based hard lockup detector"
  arm64: entry: remove redundant IRQ flag tracing
  arm64: Remove arm64_dma32_phys_limit and its uses

4 years agoMerge tag 'mips_fixes_5.11.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Fri, 15 Jan 2021 21:07:33 +0000 (13:07 -0800)]
Merge tag 'mips_fixes_5.11.1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fix coredumps on 64bit kernels

 - fix for alignment bugs preventing booting

 - fix checking for failed irq_alloc_desc calls

* tag 'mips_fixes_5.11.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: OCTEON: fix unreachable code in octeon_irq_init_ciu
  MIPS: relocatable: fix possible boot hangup with KASLR enabled
  MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
  MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB

4 years agoperf inject: Correct event attribute sizes
Al Grant [Tue, 24 Nov 2020 19:58:17 +0000 (19:58 +0000)]
perf inject: Correct event attribute sizes

When 'perf inject' reads a perf.data file from an older version of perf,
it writes event attributes into the output with the original size field,
but lays them out as if they had the size currently used. Readers see a
corrupt file. Update the size field to match the layout.

Signed-off-by: Al Grant <al.grant@foss.arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201124195818.30603-1-al.grant@arm.com
Signed-off-by: Denis Nikitin <denik@chromium.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf intel-pt: Fix 'CPU too large' error
Adrian Hunter [Thu, 7 Jan 2021 17:41:59 +0000 (19:41 +0200)]
perf intel-pt: Fix 'CPU too large' error

In some cases, the number of cpus (nr_cpus_online) is confused with the
maximum cpu number (nr_cpus_avail), which results in the error in the
example below:

Example on system with 8 cpus:

 Before:
   # echo 0 > /sys/devices/system/cpu/cpu2/online
   # ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname
   Linux
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.147 MB perf.data ]
   # ./perf script --itrace=e
   Requested CPU 7 too large. Consider raising MAX_NR_CPUS
   0x25908 [0x8]: failed to process type: 68 [Invalid argument]

 After:
   # ./perf script --itrace=e
   #

Fixes: 8c7274691f0d ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Fixes: 7df4e36a4785 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf stat: Take cgroups into account for shadow stats
Namhyung Kim [Fri, 15 Jan 2021 07:11:39 +0000 (16:11 +0900)]
perf stat: Take cgroups into account for shadow stats

As of now it doesn't consider cgroups when collecting shadow stats and
metrics so counter values from different cgroups will be saved in a same
slot.  This resulted in incorrect numbers when those cgroups have
different workloads.

For example, let's look at the scenario below: cgroups A and C runs same
workload which burns a cpu while cgroup B runs a light workload.

  $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1

   Performance counter stats for 'system wide':

     3,958,116,522      cycles                A
     6,722,650,929      instructions          A #    2.53  insn per cycle
         1,132,741      cycles                B
           571,743      instructions          B #    0.00  insn per cycle
     4,007,799,935      cycles                C
     6,793,181,523      instructions          C #    2.56  insn per cycle

       1.001050869 seconds time elapsed

When I run 'perf stat' with single workload, it usually shows IPC around
1.7.  We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.

But in this case, since cgroups are ignored, cycles are averaged so it
used the lower value for IPC calculation and resulted in around 2.5.

  avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
  IPC (A)  :  6722650929 / 2655683066 = 2.531
  IPC (B)  :      571743 / 2655683066 = 0.0002
  IPC (C)  :  6793181523 / 2655683066 = 2.557

We can simply compare cgroup pointers in the evsel and it'll be NULL
when cgroups are not specified.  With this patch, I can see correct
numbers like below:

  $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1

  Performance counter stats for 'system wide':

     4,171,051,687      cycles                A
     7,219,793,922      instructions          A #    1.73  insn per cycle
         1,051,189      cycles                B
           583,102      instructions          B #    0.55  insn per cycle
     4,171,124,710      cycles                C
     7,192,944,580      instructions          C #    1.72  insn per cycle

       1.007909814 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf stat: Introduce struct runtime_stat_data
Namhyung Kim [Fri, 15 Jan 2021 07:11:38 +0000 (16:11 +0900)]
perf stat: Introduce struct runtime_stat_data

To pass more info to the saved_value in the runtime_stat, add a new
struct runtime_stat_data.  Currently it only has 'ctx' field but later
patch will add more.

Note that we intentionally pass 0 as ctx to clock-related events for
compatibility.  It was already there in a few places.  So move the code
into the saved_value_lookup() explicitly and add a comment.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf tests: Fail when failing to get a tracepoint id
Ian Rogers [Thu, 14 Jan 2021 18:02:50 +0000 (10:02 -0800)]
libperf tests: Fail when failing to get a tracepoint id

Permissions are necessary to get a tracepoint id. Fail the test when the
read fails.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf tests: If a test fails return non-zero
Ian Rogers [Thu, 14 Jan 2021 18:02:49 +0000 (10:02 -0800)]
libperf tests: If a test fails return non-zero

If a test fails return -1 rather than 0. This is consistent with the
return value in test-cpumap.c

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agolibperf tests: Avoid uninitialized variable warning
Ian Rogers [Thu, 14 Jan 2021 21:23:04 +0000 (13:23 -0800)]
libperf tests: Avoid uninitialized variable warning

The variable 'bf' is read (for a write call) without being initialized
triggering a memory sanitizer warning. Use 'bf' in the read and switch
the write to reading from a string.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114212304.4018119-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoext4: remove expensive flush on fast commit
Daejun Park [Wed, 6 Jan 2021 01:32:42 +0000 (10:32 +0900)]
ext4: remove expensive flush on fast commit

In the fast commit, it adds REQ_FUA and REQ_PREFLUSH on each fast
commit block when barrier is enabled.  However, in recovery phase,
ext4 compares CRC value in the tail.  So it is sufficient to add
REQ_FUA and REQ_PREFLUSH on the block that has tail.

Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210106013242epcms2p5b6b4ed8ca86f29456fdf56aa580e74b4@epcms2p5
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
4 years agoext4: fix bug for rename with RENAME_WHITEOUT
yangerkun [Tue, 5 Jan 2021 06:28:57 +0000 (14:28 +0800)]
ext4: fix bug for rename with RENAME_WHITEOUT

We got a "deleted inode referenced" warning cross our fsstress test. The
bug can be reproduced easily with following steps:

  cd /dev/shm
  mkdir test/
  fallocate -l 128M img
  mkfs.ext4 -b 1024 img
  mount img test/
  dd if=/dev/zero of=test/foo bs=1M count=128
  mkdir test/dir/ && cd test/dir/
  for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
  cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
    /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
    ext4_rename will return ENOSPC!!
  cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
  We will get the output:
  "ls: cannot access 'test/dir/file1': Structure needs cleaning"
  and the dmesg show:
  "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
  deleted inode referenced: 139"

ext4_rename will create a special inode for whiteout and use this 'ino'
to replace the source file's dir entry 'ino'. Once error happens
latter(the error above was the ENOSPC return from ext4_add_entry in
ext4_rename since all space has been consumed), the cleanup do drop the
nlink for whiteout, but forget to restore 'ino' with source file. This
will trigger the bug describle as above.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Fixes: cd808deced43 ("ext4: support RENAME_WHITEOUT")
Link: https://lore.kernel.org/r/20210105062857.3566-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
4 years agoext4: fix wrong list_splice in ext4_fc_cleanup
Daejun Park [Wed, 30 Dec 2020 09:48:51 +0000 (18:48 +0900)]
ext4: fix wrong list_splice in ext4_fc_cleanup

After full/fast commit, entries in staging queue are promoted to main
queue. In ext4_fs_cleanup function, it splice to staging queue to
staging queue.

Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path")
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201230094851epcms2p6eeead8cc984379b37b2efd21af90fd1a@epcms2p6
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
4 years agoext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
Yi Li [Wed, 30 Dec 2020 03:38:27 +0000 (11:38 +0800)]
ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR

1: ext4_iget/ext4_find_extent never returns NULL, use IS_ERR
instead of IS_ERR_OR_NULL to fix this.

2: ext4_fc_replay_inode should set the inode to NULL when IS_ERR.
and go to call iput properly.

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Yi Li <yili@winhong.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201230033827.3996064-1-yili@winhong.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
4 years agoperf test: Fix shadow stat test for non-bash shells
Namhyung Kim [Thu, 14 Jan 2021 05:06:09 +0000 (14:06 +0900)]
perf test: Fix shadow stat test for non-bash shells

It was using some bash-specific features and failed to parse when
running with a different shell like below:

  root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv
  83: perf stat metrics (shadow stat) test                            :
  --- start ---
  test child forked, pid 3922
  ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
  (standard_in) 2: syntax error
  ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
  (standard_in) 2: syntax error
  ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found
  test child finished with -1
  ---- end ----
  perf stat metrics (shadow stat) test: FAILED!

Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools headers: Syncronize linux/build_bug.h with the kernel sources
Arnaldo Carvalho de Melo [Thu, 14 Jan 2021 15:35:27 +0000 (12:35 -0300)]
tools headers: Syncronize linux/build_bug.h with the kernel sources

To pick up the changes in:

  3a176b94609a18f5 ("Revert "kbuild: avoid static_assert for genksyms"")

And silence this perf build warning:

  Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h'
  diff -u tools/include/linux/build_bug.h include/linux/build_bug.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agotools headers UAPI: Sync kvm.h headers with the kernel sources
Arnaldo Carvalho de Melo [Thu, 14 Jan 2021 15:26:08 +0000 (12:26 -0300)]
tools headers UAPI: Sync kvm.h headers with the kernel sources

To pick the changes in:

  647daca25d24fb6e ("KVM: SVM: Add support for booting APs in an SEV-ES guest")

That don't cause any tooling change, just silences this perf build
warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoperf bpf examples: Fix bpf.h header include directive in 5sec.c example
Arnaldo Carvalho de Melo [Tue, 29 Dec 2020 18:41:19 +0000 (15:41 -0300)]
perf bpf examples: Fix bpf.h header include directive in 5sec.c example

It was looking at bpf/bpf.h, which caused this problem:

  # perf trace -e tools/perf/examples/bpf/5sec.c
  /home/acme/git/perf/tools/perf/examples/bpf/5sec.c:42:10: fatal error: 'bpf/bpf.h' file not found
  #include <bpf/bpf.h>
           ^~~~~~~~~~~
  1 error generated.
  ERROR: unable to compile tools/perf/examples/bpf/5sec.c
  Hint: Check error message shown above.
  Hint: You can also pre-compile it into .o using:
        clang -target bpf -O2 -c tools/perf/examples/bpf/5sec.c
        with proper -I and -D options.
  event syntax error: 'tools/perf/examples/bpf/5sec.c'
                       \___ Failed to load tools/perf/examples/bpf/5sec.c from source: Error when compiling BPF scriptlet
  #

Change that to plain bpf.h, to make it work again:

  # perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s
       0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000)
  # perf trace -e tools/perf/examples/bpf/5sec.c/max-stack=16/ sleep 5s
       0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000)
                                         hrtimer_nanosleep ([kernel.kallsyms])
                                         common_nsleep ([kernel.kallsyms])
                                         __x64_sys_clock_nanosleep ([kernel.kallsyms])
                                         do_syscall_64 ([kernel.kallsyms])
                                         entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                         __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
  # perf trace -e tools/perf/examples/bpf/5sec.c sleep 4s
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMerge tag 'acpi-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 15 Jan 2021 18:55:33 +0000 (10:55 -0800)]
Merge tag 'acpi-5.11-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These address a device ID bounds check error in the device enumeration
  code and fix a mistake in the documentation.

  Specifics:

   - Harden the ACPI device enumeration code against device ID length
     overflows to address a Linux VM cash on Hyper-V (Dexuan Cui).

   - Fix a mistake in the documentation of error type values for PCIe
     errors (Qiuxu Zhuo)"

* tag 'acpi-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Documentation: ACPI: EINJ: Fix error type values for PCIe errors
  ACPI: scan: Harden acpi_device_add() against device ID overflows

4 years agoMerge tag 'for-linus-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 15 Jan 2021 18:52:00 +0000 (10:52 -0800)]
Merge tag 'for-linus-5.11-rc4-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - A series to fix a regression when running as a fully virtualized
   guest on an old Xen hypervisor not supporting PV interrupt callbacks
   for HVM guests.

 - A patch to add support to query Xen resource sizes (setting was
   possible already) from user mode.

* tag 'for-linus-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: Fix xen_hvm_smp_init() when vector callback not available
  x86/xen: Don't register Xen IPIs when they aren't going to be used
  x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery
  xen: Set platform PCI device INTX affinity to CPU0
  xen: Fix event channel callback via INTX/GSI
  xen/privcmd: allow fetching resource sizes

4 years agoperf build: Support build BPF skeletons with perf
Song Liu [Tue, 29 Dec 2020 21:42:13 +0000 (13:42 -0800)]
perf build: Support build BPF skeletons with perf

BPF programs are useful in perf to profile BPF programs.

BPF skeleton is by far the easiest way to write BPF tools. Enable
building BPF skeletons in util/bpf_skel. A dummy bpf skeleton is added.
More bpf skeletons will be added for different use cases.

Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201229214214.3413833-3-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMerge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 15 Jan 2021 18:48:58 +0000 (10:48 -0800)]
Merge tag 'iommu-fixes' of git://git./linux/kernel/git/arm64/linux

Pull iommu fixes from Will Deacon:
 "Three IOMMU fixes for -rc4.

  The main one is a change to the Intel IOMMU driver to fix the handling
  of unaligned addresses when invalidating the TLB.

  The fix itself is a bit ugly (the caller does a bunch of shifting
  which is then effectively undone later in the callchain), but Lu has
  patches to clean all of this up in 5.12.

  Summary:

   - Fix address alignment handling for VT-D TLB invalidation

   - Enable workarounds for buggy Qualcomm firmware on two more SoCs

   - Drop duplicate #include"

* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  iommu/vt-d: Fix duplicate included linux/dma-map-ops.h
  iommu: arm-smmu-qcom: Add sdm630/msm8998 compatibles for qcom quirks
  iommu/vt-d: Fix unaligned addresses for intel_flush_svm_range_dev()

4 years agobpftool: Add Makefile target bootstrap
Song Liu [Mon, 28 Dec 2020 17:40:51 +0000 (09:40 -0800)]
bpftool: Add Makefile target bootstrap

This target is used to only build the bootstrap bpftool, which will be
used to generate bpf skeletons for other tools, like perf.

Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201228174054.907740-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
4 years agoMerge tag 'topic/nouveau-ampere-modeset-2021-01-15' of git://anongit.freedesktop...
Linus Torvalds [Fri, 15 Jan 2021 18:45:54 +0000 (10:45 -0800)]
Merge tag 'topic/nouveau-ampere-modeset-2021-01-15' of git://anongit.freedesktop.org/drm/drm

Pull drm nouveau ampere display support from Dave Airlie:
 "Ben has requested if we can include Ampere modesetting support under
  fixes, it's for new GPUs and shouldn't affect existing hardware.

  It's a bit bigger than just adding a PCI ID, but It has no effect on
  older GPUs"

* tag 'topic/nouveau-ampere-modeset-2021-01-15' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/disp/ga10[24]: initial support
  drm/nouveau/dmaobj/ga10[24]: initial support
  drm/nouveau/i2c/ga10[024]: initial support
  drm/nouveau/gpio/ga10[024]: initial support
  drm/nouveau/bar/ga10[024]: initial support
  drm/nouveau/mmu/ga10[024]: initial support
  drm/nouveau/timer/ga10[024]: initial support
  drm/nouveau/fb/ga10[024]: initial support
  drm/nouveau/imem/ga10[024]: initial support
  drm/nouveau/privring/ga10[024]: initial support
  drm/nouveau/mc/ga10[024]: initial support
  drm/nouveau/devinit/ga10[024]: initial support
  drm/nouveau/bios/ga10[024]: initial support
  drm/nouveau/pci/ga10[024]: initial support
  drm/nouveau/core: recognise GA10[024]

4 years agoMerge branch 'acpi-docs'
Rafael J. Wysocki [Fri, 15 Jan 2021 18:15:49 +0000 (19:15 +0100)]
Merge branch 'acpi-docs'

* acpi-docs:
  Documentation: ACPI: EINJ: Fix error type values for PCIe errors

4 years agoio_uring: flush timeouts that should already have expired
Marcelo Diop-Gonzalez [Fri, 15 Jan 2021 16:54:40 +0000 (11:54 -0500)]
io_uring: flush timeouts that should already have expired

Right now io_flush_timeouts() checks if the current number of events
is equal to ->timeout.target_seq, but this will miss some timeouts if
there have been more than 1 event added since the last time they were
flushed (possible in io_submit_flush_completions(), for example). Fix
it by recording the last sequence at which timeouts were flushed so
that the number of events seen can be compared to the number of events
needed without overflow.

Signed-off-by: Marcelo Diop-Gonzalez <marcelo827@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agospi: cadence: cache reference clock rate during probe
Michael Hennerich [Thu, 14 Jan 2021 15:42:17 +0000 (17:42 +0200)]
spi: cadence: cache reference clock rate during probe

The issue is that using SPI from a callback under the CCF lock will
deadlock, since this code uses clk_get_rate().

Fixes: c474b38665463 ("spi: Add driver for Cadence SPI controller")
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20210114154217.51996-1-alexandru.ardelean@analog.com
Signed-off-by: Mark Brown <broonie@kernel.org>
4 years agoarm64: selftests: Fix spelling of 'Mismatch'
Mark Brown [Fri, 8 Jan 2021 18:31:44 +0000 (18:31 +0000)]
arm64: selftests: Fix spelling of 'Mismatch'

The SVE and FPSIMD stress tests have a spelling mistake in the output, fix
it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210108183144.673-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: syscall: include prototype for EL0 SVC functions
Mark Rutland [Thu, 14 Jan 2021 12:48:12 +0000 (12:48 +0000)]
arm64: syscall: include prototype for EL0 SVC functions

The kbuild test robot reports that when building with W=1, GCC will warn
for a couple of missing prototypes in syscall.c:

|  arch/arm64/kernel/syscall.c:157:6: warning: no previous prototype for 'do_el0_svc' [-Wmissing-prototypes]
|    157 | void do_el0_svc(struct pt_regs *regs)
|        |      ^~~~~~~~~~
|  arch/arm64/kernel/syscall.c:164:6: warning: no previous prototype for 'do_el0_svc_compat' [-Wmissing-prototypes]
|    164 | void do_el0_svc_compat(struct pt_regs *regs)
|        |      ^~~~~~~~~~~~~~~~~

While this isn't a functional problem, as a general policy we should
include the prototype for functions wherever possible to catch any
accidental divergence between the prototype and implementation. Here we
can easily include <asm/exception.h>, so let's do so.

While there are a number of warnings elsewhere and some warnings enabled
under W=1 are of questionable benefit, this change helps to make the
code more robust as it evolved and reduces the noise somewhat, so it
seems worthwhile.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/202101141046.n8iPO3mw-lkp@intel.com
Link: https://lore.kernel.org/r/20210114124812.17754-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agocompiler.h: Raise minimum version of GCC to 5.1 for arm64
Will Deacon [Tue, 12 Jan 2021 22:48:32 +0000 (22:48 +0000)]
compiler.h: Raise minimum version of GCC to 5.1 for arm64

GCC versions >= 4.9 and < 5.1 have been shown to emit memory references
beyond the stack pointer, resulting in memory corruption if an interrupt
is taken after the stack pointer has been adjusted but before the
reference has been executed. This leads to subtle, infrequent data
corruption such as the EXT4 problems reported by Russell King at the
link below.

Life is too short for buggy compilers, so raise the minimum GCC version
required by arm64 to 5.1.

Reported-by: Russell King <linux@armlinux.org.uk>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20210105154726.GD1551@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20210112224832.10980-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoMerge branch '04.01-ampere-lite' of git://github.com/skeggsb/linux into topic/nouveau...
Dave Airlie [Fri, 15 Jan 2021 04:48:14 +0000 (14:48 +1000)]
Merge branch '04.01-ampere-lite' of git://github.com/skeggsb/linux into topic/nouveau-ampere-modeset

This adds support for basic modeseting on the nvidia ampere chipsets. This code should all
be contained to just those and have no effect on current hardware.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5LmMP+HbDUQBf_dy1-0eS9fA32k8HWo4y5X4-7rsw-yw@mail.gmail.com
4 years agoMerge tag 'drm-fixes-2021-01-15' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 15 Jan 2021 04:10:06 +0000 (20:10 -0800)]
Merge tag 'drm-fixes-2021-01-15' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular fixes for rc4, a bunch of fixes across i915, amdgpu and
  nouveau here, along with a couple of TTM fixes, and dma-buf and one
  core pageflip/modifier interaction fix.

  One notable i915 fix is a HSW GT1 regression fix that has been
  outstanding for quite a while. (Thanks to Matt Turner for kicking
  Intel into getting it fixed).

  dma-buf:
   - Fix a memory leak in CMAV heap

  core:
   - Fix format check for legacy pageflips

  ttm:
   - Pass correct address to dma_mapping_error()
   - Use mutex in pool shrinker

  i915:
   - Allow the sysadmin to override security mitigations
   - Restore clear-residual mitigations for ivb/byt
   - Limit VFE threads based on GT
   - GVT: fix vfio edid and full display detection
   - Fix DSI DSC power refcounting
   - Fix LPT CPU mode backlight takeover
   - Disable RPM wakeref assertions during driver shutdown
   - Fix DSI sequence sleeps

  amdgpu:
   - Update repo location in MAINTAINERS
   - Add some new renoir PCI IDs
   - Revert CRC UAPI changes
   - Revert OLED display fix which cases clocking problems for some systems
   - Misc vangogh fixes
   - GFX fix for sienna cichlid
   - DCN1.0 fix for pipe split
   - Fix incorrect PSP command

  amdkfd:
   - Fix possible out of bounds read in vcrat creation

  nouveau:
   - irq handling fix
   - expansion ROM fix
   - hw init dpcd disable
   - aux semaphore owner field fix
   - vram heap sizing fix
   - notifier at 0 is valid fix"

* tag 'drm-fixes-2021-01-15' of git://anongit.freedesktop.org/drm/drm: (37 commits)
  drm/nouveau/kms/nv50-: fix case where notifier buffer is at offset 0
  drm/nouveau/mmu: fix vram heap sizing
  drm/nouveau/i2c/gm200: increase width of aux semaphore owner fields
  drm/nouveau/i2c/gk110-: disable hw-initiated dpcd reads
  drm/nouveau/i2c/gk110: split out from i2c/gk104
  drm/nouveau/privring: ack interrupts the same way as RM
  drm/nouveau/bios: fix issue shadowing expansion ROMs
  drm/amd/display: Fix to be able to stop crc calculation
  Revert "drm/amd/display: Expose new CRC window property"
  Revert "drm/amdgpu/disply: fix documentation warnings in display manager"
  Revert "drm/amd/display: Fix unused variable warning"
  drm/amdgpu: set power brake sequence
  drm/amdgpu: add new device id for Renior
  drm/amdgpu: add green_sardine device id (v2)
  drm/amdgpu: fix vram type and bandwidth error for DDR5 and DDR4
  drm/amdgpu/gfx10: add updated GOLDEN_TSC_COUNT_UPPER/LOWER register offsets for VGH
  drm/amdkfd: Fix out-of-bounds read in kdf_create_vcrat_image_cpu()
  Revert "drm/amd/display: Fixed Intermittent blue screen on OLED panel"
  drm/amd/display: disable dcn10 pipe split by default
  drm/amd/display: Add a missing DCN3.01 API mapping
  ...

4 years agoMerge tag 'trace-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Fri, 15 Jan 2021 04:06:29 +0000 (20:06 -0800)]
Merge tag 'trace-v5.11-rc3' of git://git./linux/kernel/git/rostedt/linux-trace

Pull bootconfig fix from Steven Rostedt:
 "Update bootconf scripts for tracing_on option

  The tracing_on option is supported by bootconfig entries, but the
  scripts to convert from ftrace to a bootconfig and back were not
  updated"

* tag 'trace-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tools/bootconfig: Add tracing_on support to helper scripts

4 years agoMerge branch '04.00-ampere-lite-fixes' of git://github.com/skeggsb/linux into drm...
Dave Airlie [Fri, 15 Jan 2021 03:17:54 +0000 (13:17 +1000)]
Merge branch '04.00-ampere-lite-fixes' of git://github.com/skeggsb/linux into drm-fixes

As requested, here's a tree with the non-Ampere-specific fixes split
out, as most of them are potentially relevant to already-supported
GPUs.

- irq handling fix
- expansion ROM fix
- hw init dpcd disable
- aux semaphore owner field fix
- vram heap sizing fix
- notifier at 0 is valid fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv4P90mcF_ByAh+ghz+ZVD2N2bPbD7xHYYArE1kYrvsGcQ@mail.gmail.com
4 years agoriscv: stacktrace: Move register keyword to beginning of declaration
Kefeng Wang [Thu, 14 Jan 2021 02:46:57 +0000 (10:46 +0800)]
riscv: stacktrace: Move register keyword to beginning of declaration

Using global sp_in_global directly to fix the following warning,

arch/riscv/kernel/stacktrace.c:31:3: warning: â€˜register’ is not at beginning of declaration [-Wold-style-declaration]
31 |   const register unsigned long current_sp = sp_in_global;
   |   ^~~~~

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
4 years agoMerge tag 'amd-drm-fixes-5.11-2021-01-14' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 15 Jan 2021 01:56:21 +0000 (11:56 +1000)]
Merge tag 'amd-drm-fixes-5.11-2021-01-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.11-2021-01-14:

amdgpu:
- Update repo location in MAINTAINERS
- Add some new renoir PCI IDs
- Revert CRC UAPI changes
- Revert OLED display fix which cases clocking problems for some systems
- Misc vangogh fixes
- GFX fix for sienna cichlid
- DCN1.0 fix for pipe split
- Fix incorrect PSP command

amdkfd:
- Fix possible out of bounds read in vcrat creation

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114201354.3998-1-alexander.deucher@amd.com
4 years agoMerge tag 'drm-intel-fixes-2021-01-14' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Fri, 15 Jan 2021 01:47:04 +0000 (11:47 +1000)]
Merge tag 'drm-intel-fixes-2021-01-14' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.11-rc4:
- Allow the sysadmin to override security mitigations
- Restore clear-residual mitigations for ivb/byt
- Limit VFE threads based on GT
- GVT: fix vfio edid and full display detection
- Fix DSI DSC power refcounting
- Fix LPT CPU mode backlight takeover
- Disable RPM wakeref assertions during driver shutdown
- Fix DSI sequence sleeps

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87sg73pz42.fsf@intel.com
4 years agodrm/nouveau/disp/ga10[24]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/disp/ga10[24]: initial support

UEFI/RM no longer use IED scripts from the VBIOS, though they appear to
have been updated for use by the x86 VBIOS code, so we should be able to
continue using them for the moment.

Unfortunately, we require some hacks to do so, as the BeforeLinkTraining
IED script became a pointer to an array of scripts instead, without a
revbump of the relevant tables.

There's also some changes to SOR clock divider fiddling, which are
hopefully correct enough that things work as they should.

AFAIK, GA100 shouldn't have display, so it hasn't been added.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/dmaobj/ga10[24]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/dmaobj/ga10[24]: initial support

Appears to be compatible with GV100 code, and not required on GA100, as
it shouldn't have display.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/i2c/ga10[024]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/i2c/ga10[024]: initial support

Appears to be compatible with GM200 code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/gpio/ga10[024]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/gpio/ga10[024]: initial support

GA100 appears to be compatible with GK104 code, the others have some
register moves.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/bar/ga10[024]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/bar/ga10[024]: initial support

Appears to be compatible with TU102 code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/mmu/ga10[024]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/mmu/ga10[024]: initial support

Appears to be compatible with TU102 code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/timer/ga10[024]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/timer/ga10[024]: initial support

Appears to be compatible with GK20A code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/fb/ga10[024]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/fb/ga10[024]: initial support

No VPR scrub.  GA102 and GA104 have a new VRAM size detection method.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/imem/ga10[024]: initial support
Ben Skeggs [Wed, 13 Jan 2021 07:12:52 +0000 (17:12 +1000)]
drm/nouveau/imem/ga10[024]: initial support

Appears to be compatible with NV50 code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>