1. 14 10月, 2020 11 次提交
  2. 13 10月, 2020 8 次提交
    • J
      perf sched: Show start of latency as well · dc000c45
      Joel Fernandes (Google) 提交于
      The 'perf sched latency' tool is really useful at showing worst-case
      latencies that task encountered since wakeup. However it shows only the
      end of the latency. Often times the start of a latency is interesting as
      it can show what else was going on at the time to cause the latency. I
      certainly myself spending a lot of time backtracking to the start of the
      latency in "perf sched script" which wastes a lot of time.
      
      This patch therefore adds a new column "Max delay start". Considering
      this, also rename "Maximum delay at" to "Max delay end" as its easier to
      understand.
      
      Example of the new output:
      
        ----------------------------------------------------------------------------------------------------------------------------------
         Task                  | Runtime ms  | Switches | Avg delay ms  | Max delay ms   | Max delay start         | Max delay end       |
        ----------------------------------------------------------------------------------------------------------------------------------
         MediaScannerSer:11936 |  651.296 ms |    67978 | avg: 0.113 ms | max: 77.250 ms | max start: 477.691360 s | max end: 477.768610 s
         audio@2.0-servi:(3)   |    0.000 ms |     3440 | avg: 0.034 ms | max: 72.267 ms | max start: 477.697051 s | max end: 477.769318 s
         AudioOut_1D:8112      |    0.000 ms |     2588 | avg: 0.083 ms | max: 64.020 ms | max start: 477.710740 s | max end: 477.774760 s
         Time-limited te:14973 | 7966.090 ms |    24807 | avg: 0.073 ms | max: 15.563 ms | max start: 477.162746 s | max end: 477.178309 s
         surfaceflinger:8049   |    9.680 ms |      603 | avg: 0.063 ms | max: 13.275 ms | max start: 476.931791 s | max end: 476.945067 s
         HeapTaskDaemon:(3)    | 1588.830 ms |     7040 | avg: 0.065 ms | max:  6.880 ms | max start: 473.666043 s | max end: 473.672922 s
         mount-passthrou:(3)   | 1370.809 ms |    68904 | avg: 0.011 ms | max:  6.524 ms | max start: 478.090630 s | max end: 478.097154 s
         ReferenceQueueD:(3)   |   11.794 ms |     1725 | avg: 0.014 ms | max:  6.521 ms | max start: 476.119782 s | max end: 476.126303 s
         writer:14077          |   18.410 ms |     1427 | avg: 0.036 ms | max:  6.131 ms | max start: 474.169675 s | max end: 474.175805 s
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20200925235634.4089867-1-joel@joelfernandes.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dc000c45
    • S
      perf vendor events: Fix typos in power8 PMU events · 70830f97
      Sandipan Das 提交于
      This replaces the incorrectly spelled word "localtion" with "location"
      in some power8 PMU event descriptions.
      
      Fixes: 2a81fa3b ("perf vendor events: Add power8 PMU events")
      Signed-off-by: NSandipan Das <sandipan@linux.ibm.com>
      Reviewed-by: NKajol Jain <kjain@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20201012050205.328523-1-sandipan@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      70830f97
    • N
      perf bench: Run inject-build-id with --buildid-all option too · bf7ef5dd
      Namhyung Kim 提交于
      For comparison, it now runs the benchmark twice - one if regular -b and
      another for --buildid-all.
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 21.002 msec (+- 0.172 msec)
          Average time per event: 2.059 usec (+- 0.017 usec)
          Average memory usage: 8169 KB (+- 0 KB)
          Average build-id-all injection took: 19.543 msec (+- 0.124 msec)
          Average time per event: 1.916 usec (+- 0.012 usec)
          Average memory usage: 7348 KB (+- 0 KB)
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-7-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bf7ef5dd
    • N
      perf inject: Add --buildid-all option · 27c9c342
      Namhyung Kim 提交于
      Like 'perf record', we can even more speedup build-id processing by just
      using all DSOs.  Then we don't need to look at all the sample events
      anymore.  The following patch will update 'perf bench' to show the result
      of the --buildid-all option too.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Original-patch-by: NStephane Eranian <eranian@google.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      27c9c342
    • N
      perf inject: Do not load map/dso when injecting build-id · e7b60c5a
      Namhyung Kim 提交于
      No need to load symbols in a DSO when injecting build-id.  I guess the
      reason was to check the DSO is a special file like anon files.  Use some
      helper functions in map.c to check them before reading build-id.  Also
      pass sample event's cpumode to a new build-id event.
      
      It brought a speedup in the benchmark of 25 -> 21 msec on my laptop.
      Also the memory usage (Max RSS) went down by ~200 KB.
      
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 21.389 msec (+- 0.138 msec)
          Average time per event: 2.097 usec (+- 0.014 usec)
          Average memory usage: 8225 KB (+- 0 KB)
      
      Committer notes:
      
      Before:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,020.56 msec task-clock:u              #    1.271 CPUs utilized            ( +-  0.74% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   123,354      page-faults:u             #    0.031 M/sec                    ( +-  0.81% )
             7,119,951,568      cycles:u                  #    1.771 GHz                      ( +-  1.74% )  (83.27%)
               230,086,969      stalled-cycles-frontend:u #    3.23% frontend cycles idle     ( +-  1.97% )  (83.41%)
             1,168,298,765      stalled-cycles-backend:u  #   16.41% backend cycles idle      ( +-  1.13% )  (83.44%)
            11,173,083,669      instructions:u            #    1.57  insn per cycle
                                                          #    0.10  stalled cycles per insn  ( +-  1.58% )  (83.31%)
             2,413,908,936      branches:u                #  600.392 M/sec                    ( +-  1.69% )  (83.26%)
                46,576,289      branch-misses:u           #    1.93% of all branches          ( +-  2.20% )  (83.31%)
      
                    3.1638 +- 0.0309 seconds time elapsed  ( +-  0.98% )
      
        $
      
      After:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  2,379.94 msec task-clock:u              #    1.473 CPUs utilized            ( +-  0.18% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                    62,584      page-faults:u             #    0.026 M/sec                    ( +-  0.07% )
             2,372,389,668      cycles:u                  #    0.997 GHz                      ( +-  0.29% )  (83.14%)
               106,937,862      stalled-cycles-frontend:u #    4.51% frontend cycles idle     ( +-  4.89% )  (83.20%)
               581,697,915      stalled-cycles-backend:u  #   24.52% backend cycles idle      ( +-  0.71% )  (83.47%)
             3,659,692,199      instructions:u            #    1.54  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.10% )  (83.63%)
               791,372,961      branches:u                #  332.518 M/sec                    ( +-  0.27% )  (83.39%)
                10,648,083      branch-misses:u           #    1.35% of all branches          ( +-  0.22% )  (83.16%)
      
                   1.61570 +- 0.00172 seconds time elapsed  ( +-  0.11% )
      
        $
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Original-patch-by: NStephane Eranian <eranian@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e7b60c5a
    • N
      perf inject: Enter namespace when reading build-id · 336c95b2
      Namhyung Kim 提交于
      It should be in a proper mnt namespace when accessing the file.
      
      I think this had no problem since the build-id was actually read from
      map__load() -> dso__load() already.  But I'd like to change it in the
      following commit.
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      336c95b2
    • N
      perf inject: Add missing callbacks in perf_tool · 2946eced
      Namhyung Kim 提交于
      I found some events (like PERF_RECORD_CGROUP) are not copied by perf
      inject due to the missing callbacks.  Let's add them.
      
      While at it, I've changed the order of the callbacks to match with
      struct perf_tool so that we can compare them easily.
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2946eced
    • N
      perf bench: Add build-id injection benchmark · 0bf02a0d
      Namhyung Kim 提交于
      Sometimes I can see that 'perf record' piped with 'perf inject' take a
      long time processing build-ids.
      
      So introduce a inject-build-id benchmark to the internals benchmark
      suite to measure its overhead regularly.
      
      It runs the 'perf inject' command internally and feeds the given number
      of synthesized events (MMAP2 + SAMPLE basically).
      
        Usage: perf bench internals inject-build-id <options>
      
          -i, --iterations <n>  Number of iterations used to compute average (default: 100)
          -m, --nr-mmaps <n>    Number of mmap events for each iteration (default: 100)
          -n, --nr-samples <n>  Number of sample events per mmap event (default: 100)
          -v, --verbose         be more verbose (show iteration count, DSO name, etc)
      
      By default, it measures average processing time of 100 MMAP2 events
      and 10000 SAMPLE events.  Below is a result on my laptop.
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 25.789 msec (+- 0.202 msec)
          Average time per event: 2.528 usec (+- 0.020 usec)
          Average memory usage: 8411 KB (+- 7 KB)
      
      Committer testing:
      
        $ perf bench
        Usage:
        	perf bench [<common options>] <collection> <benchmark> [<options>]
      
                # List of all available benchmark collections:
      
                 sched: Scheduler and IPC benchmarks
               syscall: System call benchmarks
                   mem: Memory access benchmarks
                  numa: NUMA scheduling and MM benchmarks
                 futex: Futex stressing benchmarks
                 epoll: Epoll stressing benchmarks
             internals: Perf-internals benchmarks
                   all: All benchmarks
      
        $ perf bench internals
      
                # List of available benchmarks for collection 'internals':
      
            synthesize: Benchmark perf event synthesis
        kallsyms-parse: Benchmark kallsyms parsing
        inject-build-id: Benchmark build-id injection
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.202 msec (+- 0.059 msec)
          Average time per event: 1.392 usec (+- 0.006 usec)
          Average memory usage: 12650 KB (+- 10 KB)
          Average build-id-all injection took: 12.831 msec (+- 0.071 msec)
          Average time per event: 1.258 usec (+- 0.007 usec)
          Average memory usage: 11895 KB (+- 10 KB)
        $
      
        $ perf stat -r5 perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.380 msec (+- 0.056 msec)
          Average time per event: 1.410 usec (+- 0.006 usec)
          Average memory usage: 12608 KB (+- 11 KB)
          Average build-id-all injection took: 11.889 msec (+- 0.064 msec)
          Average time per event: 1.166 usec (+- 0.006 usec)
          Average memory usage: 11838 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.246 msec (+- 0.065 msec)
          Average time per event: 1.397 usec (+- 0.006 usec)
          Average memory usage: 12744 KB (+- 10 KB)
          Average build-id-all injection took: 12.019 msec (+- 0.066 msec)
          Average time per event: 1.178 usec (+- 0.006 usec)
          Average memory usage: 11963 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.321 msec (+- 0.067 msec)
          Average time per event: 1.404 usec (+- 0.007 usec)
          Average memory usage: 12690 KB (+- 10 KB)
          Average build-id-all injection took: 11.909 msec (+- 0.041 msec)
          Average time per event: 1.168 usec (+- 0.004 usec)
          Average memory usage: 11938 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.287 msec (+- 0.059 msec)
          Average time per event: 1.401 usec (+- 0.006 usec)
          Average memory usage: 12864 KB (+- 10 KB)
          Average build-id-all injection took: 11.862 msec (+- 0.058 msec)
          Average time per event: 1.163 usec (+- 0.006 usec)
          Average memory usage: 12103 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.402 msec (+- 0.053 msec)
          Average time per event: 1.412 usec (+- 0.005 usec)
          Average memory usage: 12876 KB (+- 10 KB)
          Average build-id-all injection took: 11.826 msec (+- 0.061 msec)
          Average time per event: 1.159 usec (+- 0.006 usec)
          Average memory usage: 12111 KB (+- 10 KB)
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,267.48 msec task-clock:u              #    1.502 CPUs utilized            ( +-  0.14% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   102,092      page-faults:u             #    0.024 M/sec                    ( +-  0.08% )
             3,894,589,578      cycles:u                  #    0.913 GHz                      ( +-  0.19% )  (83.49%)
               140,078,421      stalled-cycles-frontend:u #    3.60% frontend cycles idle     ( +-  0.77% )  (83.34%)
               948,581,189      stalled-cycles-backend:u  #   24.36% backend cycles idle      ( +-  0.46% )  (83.25%)
             5,835,587,719      instructions:u            #    1.50  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.21% )  (83.24%)
             1,267,423,636      branches:u                #  296.996 M/sec                    ( +-  0.22% )  (83.12%)
                17,484,290      branch-misses:u           #    1.38% of all branches          ( +-  0.12% )  (83.55%)
      
                   2.84176 +- 0.00222 seconds time elapsed  ( +-  0.08% )
      
        $
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bf02a0d
  3. 01 10月, 2020 3 次提交
    • J
      perf python scripting: Fix printable strings in python3 scripts · 6fcd5ddc
      Jiri Olsa 提交于
      Hagen reported broken strings in python3 tracepoint scripts:
      
        make PYTHON=python3
        perf record -e sched:sched_switch -a -- sleep 5
        perf script --gen-script py
        perf script -s ./perf-script.py
      
        [..]
        sched__sched_switch      7 563231.759525792        0 swapper   prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'),
      
      The problem is in the is_printable_array function that does not take the
      zero byte into account and claim such string as not printable, so the
      code will create byte array instead of string.
      
      Committer testing:
      
      After this fix:
      
      sched__sched_switch 3 484522.497072626  1158680 kworker/3:0-eve  prev_comm=kworker/3:0, prev_pid=1158680, prev_prio=120, prev_state=I, next_comm=swapper/3, next_pid=0, next_prio=120
      Sample: {addr=0, cpu=3, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1158680, tid=1158680, time=484522497072626, transaction=0, values=[(0, 0)], weight=0}
      
      sched__sched_switch 4 484522.497085610  1225814 perf             prev_comm=perf, prev_pid=1225814, prev_prio=120, prev_state=, next_comm=migration/4, next_pid=30, next_prio=0
      Sample: {addr=0, cpu=4, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1225814, tid=1225814, time=484522497085610, transaction=0, values=[(0, 0)], weight=0}
      
      Fixes: 249de6e0 ("perf script python: Fix string vs byte array resolving")
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200928201135.3633850-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6fcd5ddc
    • A
      perf trace: Use the autogenerated mmap 'prot' string/id table · 388968d8
      Arnaldo Carvalho de Melo 提交于
      No change in behaviour:
      
        # perf trace -e mmap sleep 1
             0.000 ( 0.009 ms): sleep/751870 mmap(len: 143317, prot: READ, flags: PRIVATE, fd: 3)                  = 0x7fa96d0f7000
             0.028 ( 0.004 ms): sleep/751870 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS)           = 0x7fa96d0f5000
             0.037 ( 0.005 ms): sleep/751870 mmap(len: 1872744, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3)       = 0x7fa96cf2b000
             0.044 ( 0.011 ms): sleep/751870 mmap(addr: 0x7fa96cf50000, len: 1376256, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x25000) = 0x7fa96cf50000
             0.056 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0a0000, len: 307200, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x175000) = 0x7fa96d0a0000
             0.064 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0eb000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bf000) = 0x7fa96d0eb000
             0.075 ( 0.005 ms): sleep/751870 mmap(addr: 0x7fa96d0f1000, len: 13160, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7fa96d0f1000
             0.253 ( 0.005 ms): sleep/751870 mmap(len: 218049136, prot: READ, flags: PRIVATE, fd: 3)               = 0x7fa95ff38000
        #
        #
        # set -o vi
        # strace -e mmap sleep 1
        mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f333bd83000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f333bd81000
        mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f333bbb7000
        mmap(0x7f333bbdc000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f333bbdc000
        mmap(0x7f333bd2c000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f333bd2c000
        mmap(0x7f333bd77000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f333bd77000
        mmap(0x7f333bd7d000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f333bd7d000
        mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f332ebc4000
        +++ exited with 0 +++
        #
      
      And you can as well tweak 'perf trace's output to more closely match
      strace's:
      
        # perf config trace.show_arg_names=no
        # perf config trace.show_duration=no
        # perf config trace.show_prefix=yes
        # perf config trace.show_timestamp=no
        # perf config trace.show_zeros=yes
        # perf config trace.no_inherit=yes
        # perf trace -e mmap sleep 1
        mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0)                      = 0x7f0d287ca000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS)     = 0x7f0d287c8000
        mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0)       = 0x7f0d285fe000
        mmap(0x7f0d28623000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f0d28623000
        mmap(0x7f0d28773000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f0d28773000
        mmap(0x7f0d287be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f0d287be000
        mmap(0x7f0d287c4000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f0d287c4000
        mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0)                   = 0x7f0d1b60b000
        #
      
        # perf config | grep ^trace
        trace.show_arg_names=no
        trace.show_duration=no
        trace.show_prefix=yes
        trace.show_timestamp=no
        trace.show_zeros=yes
        trace.no_inherit=yes
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      388968d8
    • A
      tools beauty: Add script to generate table of mmap's 'prot' argument · 08fc4762
      Arnaldo Carvalho de Melo 提交于
      Will be wired up in the following csets:
      
        $ tools/perf/trace/beauty/mmap_prot.sh
        static const char *mmap_prot[] = {
        	[ilog2(0x1) + 1] = "READ",
        #ifndef PROT_READ
        #define PROT_READ 0x1
        #endif
        	[ilog2(0x2) + 1] = "WRITE",
        #ifndef PROT_WRITE
        #define PROT_WRITE 0x2
        #endif
        	[ilog2(0x4) + 1] = "EXEC",
        #ifndef PROT_EXEC
        #define PROT_EXEC 0x4
        #endif
        	[ilog2(0x8) + 1] = "SEM",
        #ifndef PROT_SEM
        #define PROT_SEM 0x8
        #endif
        	[ilog2(0x01000000) + 1] = "GROWSDOWN",
        #ifndef PROT_GROWSDOWN
        #define PROT_GROWSDOWN 0x01000000
        #endif
        	[ilog2(0x02000000) + 1] = "GROWSUP",
        #ifndef PROT_GROWSUP
        #define PROT_GROWSUP 0x02000000
        #endif
        };
        $
        $
        $
        $ tools/perf/trace/beauty/mmap_prot.sh alpha
        static const char *mmap_prot[] = {
        	[ilog2(0x4) + 1] = "EXEC",
        #ifndef PROT_EXEC
        #define PROT_EXEC 0x4
        #endif
        	[ilog2(0x01000000) + 1] = "GROWSDOWN",
        #ifndef PROT_GROWSDOWN
        #define PROT_GROWSDOWN 0x01000000
        #endif
        	[ilog2(0x02000000) + 1] = "GROWSUP",
        #ifndef PROT_GROWSUP
        #define PROT_GROWSUP 0x02000000
        #endif
        	[ilog2(0x1) + 1] = "READ",
        #ifndef PROT_READ
        #define PROT_READ 0x1
        #endif
        	[ilog2(0x8) + 1] = "SEM",
        #ifndef PROT_SEM
        #define PROT_SEM 0x8
        #endif
        	[ilog2(0x2) + 1] = "WRITE",
        #ifndef PROT_WRITE
        #define PROT_WRITE 0x2
        #endif
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      08fc4762
  4. 30 9月, 2020 2 次提交
    • A
      perf beauty mmap_flags: Conditionaly define the mmap flags · 61693228
      Arnaldo Carvalho de Melo 提交于
      So that in older systems we get it in the mmap flags scnprintf routines:
      
        $ tools/perf/trace/beauty/mmap_flags.sh  | head -9 2> /dev/null
        static const char *mmap_flags[] = {
        	[ilog2(0x40) + 1] = "32BIT",
        #ifndef MAP_32BIT
        #define MAP_32BIT 0x40
        #endif
        	[ilog2(0x01) + 1] = "SHARED",
        #ifndef MAP_SHARED
        #define MAP_SHARED 0x01
        #endif
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      61693228
    • A
      perf trace beauty: Add script to autogenerate mremap's flags args string/id table · 9012e3dd
      Arnaldo Carvalho de Melo 提交于
      It'll also conditionally generate the defines, so that if we don't have
      those when building a new tool tarball in an older systems, we get
      those, and we need them sometimes in the actual scnprintf routine, such
      as when checking if a flags means we have an extra arg, like with
      MREMAP_FIXED.
      
        $ tools/perf/trace/beauty/mremap_flags.sh
        static const char *mremap_flags[] = {
        	[ilog2(1) + 1] = "MAYMOVE",
        #ifndef MREMAP_MAYMOVE
        #define MREMAP_MAYMOVE 1
        #endif
        	[ilog2(2) + 1] = "FIXED",
        #ifndef MREMAP_FIXED
        #define MREMAP_FIXED 2
        #endif
        	[ilog2(4) + 1] = "DONTUNMAP",
        #ifndef MREMAP_DONTUNMAP
        #define MREMAP_DONTUNMAP 4
        #endif
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9012e3dd
  5. 29 9月, 2020 1 次提交
  6. 28 9月, 2020 9 次提交
  7. 23 9月, 2020 6 次提交
    • H
      perf script: Add min, max to futex-contention output, in addition to avg · 69f48c70
      Hagen Paul Pfeifer 提交于
      Average is quite informative, but the outliners - especially max - are
      also of interest.
      
      Before:
      
        mutex-locker[793299] lock 5637ec61e080 contended 3400 times, 446 avg ns
        mutex-locker[793301] lock 5637ec61e080 contended 3563 times, 385 avg ns
        mutex-locker[793300] lock 5637ec61e080 contended 3110 times, 1855 avg ns
      
      After:
      
        mutex-locker[795251] lock 55b14e6dd080 contended 3853 times, 1279 avg ns [max: 12270 ns, min 340 ns]
        mutex-locker[795253] lock 55b14e6dd080 contended 2911 times, 518 avg ns [max: 51660261 ns, min 347 ns]
        mutex-locker[795252] lock 55b14e6dd080 contended 3843 times, 385 avg ns [max: 24323998 ns, min 338 ns]
      
      Committer testing:
      
        [root@five ~]# perf script record futex-contention -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.877 MB perf.data (923 samples) ]
      
        [root@five ~]# perf evlist
        syscalls:sys_enter_futex
        syscalls:sys_exit_futex
        dummy:HG
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
        #
      
      Before:
      
        [root@five ~]# perf script report futex-contention
        JS Helper[2457] lock 55fe0cf82610 contended 4 times, 6657 avg ns
        ibus-daemon[2975] lock 56227f6d0210 contended 4 times, 1020 avg ns
        chromium-browse[1905801] lock 7ffe573f5088 contended 8 times, 108463 avg ns
        gnome-shell[2240] lock 55fe0cf82678 contended 1 times, 8616 avg ns
        gnome-shel:cs0[2292] lock 55fe0d0ab768 contended 3 times, 606016034 avg ns
        JS Helper[2458] lock 55fe0cf82690 contended 1 times, 1167840 avg ns
        chromium-browse[1905470] lock 7ffe573f5358 contended 1 times, 551504 avg ns
        chromium-browse[1905948] lock 7ffe573f5358 contended 1 times, 577422 avg ns
        gnome-shell[2240] lock 55fe0cf82660 contended 6 times, 202696 avg ns
        pool[2602] lock 7fd600008ef0 contended 1 times, 500046007 avg ns
        chromium-browse[1905801] lock 7ffe573f5128 contended 4 times, 285083 avg ns
        JS Helper[2460] lock 55fe0cf82690 contended 1 times, 680877 avg ns
        JS Helper[2459] lock 55fe0cf82610 contended 7 times, 4224 avg ns
        chromium-browse[1905434] lock 7ffe573f5358 contended 1 times, 697038 avg ns
        chromium-browse[212592] lock 7ffe573f53c8 contended 4 times, 460601 avg ns
        gnome-shel:cs0[2292] lock 55fe0d0ab76c contended 2 times, 601237648 avg ns
        JS Helper[2460] lock 55fe0cf82610 contended 4 times, 3340 avg ns
        JS Helper[2462] lock 55fe0cf82694 contended 1 times, 237275 avg ns
        chromium-browse[1905605] lock 7ffe573f5358 contended 2 times, 634555 avg ns
        chromium-browse[1905992] lock 7ffe573f5358 contended 1 times, 583965 avg ns
        chromium-browse[1905647] lock 7ffe573f5368 contended 8 times, 549800 avg ns
        JS Helper[2462] lock 55fe0cf82610 contended 2 times, 4694 avg ns
        JS Helper[2461] lock 55fe0cf82694 contended 1 times, 257793 avg ns
        JS Helper[2456] lock 55fe0cf82690 contended 1 times, 677771 avg ns
        JS Helper[2463] lock 55fe0cf82610 contended 3 times, 5139 avg ns
        gdbus[2980] lock 56227f6d0210 contended 2 times, 2465 avg ns
        gnome-shell[2240] lock 55fe0cf82664 contended 5 times, 8036 avg ns
        chromium-browse[1906308] lock 7ffe573f5358 contended 1 times, 210735 avg ns
        JS Helper[2463] lock 55fe0cf82694 contended 1 times, 251531 avg ns
        chromium-browse[1905801] lock 7ffe573f4f58 contended 4 times, 399927 avg ns
        [root@five ~]#
      
      After:
      
        [root@five ~]# perf script report futex-contention
        JS Helper[2457] lock 55fe0cf82610 contended 4 times, 6657 avg ns [max: 11502 ns, min 792 ns]
        ibus-daemon[2975] lock 56227f6d0210 contended 4 times, 1020 avg ns [max: 1813 ns, min 581 ns]
        chromium-browse[1905801] lock 7ffe573f5088 contended 8 times, 108463 avg ns [max: 380103 ns, min 57989 ns]
        gnome-shell[2240] lock 55fe0cf82678 contended 1 times, 8616 avg ns [max: 8616 ns, min 8616 ns]
        gnome-shel:cs0[2292] lock 55fe0d0ab768 contended 3 times, 606016034 avg ns [max: 611295960 ns, min 600191357 ns]
        JS Helper[2458] lock 55fe0cf82690 contended 1 times, 1167840 avg ns [max: 1167840 ns, min 1167840 ns]
        chromium-browse[1905470] lock 7ffe573f5358 contended 1 times, 551504 avg ns [max: 551504 ns, min 551504 ns]
        chromium-browse[1905948] lock 7ffe573f5358 contended 1 times, 577422 avg ns [max: 577422 ns, min 577422 ns]
        gnome-shell[2240] lock 55fe0cf82660 contended 6 times, 202696 avg ns [max: 398998 ns, min 5050 ns]
        pool[2602] lock 7fd600008ef0 contended 1 times, 500046007 avg ns [max: 500046007 ns, min 500046007 ns]
        chromium-browse[1905801] lock 7ffe573f5128 contended 4 times, 285083 avg ns [max: 389531 ns, min 76183 ns]
        JS Helper[2460] lock 55fe0cf82690 contended 1 times, 680877 avg ns [max: 680877 ns, min 680877 ns]
        JS Helper[2459] lock 55fe0cf82610 contended 7 times, 4224 avg ns [max: 12724 ns, min 1012 ns]
        chromium-browse[1905434] lock 7ffe573f5358 contended 1 times, 697038 avg ns [max: 697038 ns, min 697038 ns]
        chromium-browse[212592] lock 7ffe573f53c8 contended 4 times, 460601 avg ns [max: 594956 ns, min 232996 ns]
        gnome-shel:cs0[2292] lock 55fe0d0ab76c contended 2 times, 601237648 avg ns [max: 601255863 ns, min 601219434 ns]
        JS Helper[2460] lock 55fe0cf82610 contended 4 times, 3340 avg ns [max: 9168 ns, min 962 ns]
        JS Helper[2462] lock 55fe0cf82694 contended 1 times, 237275 avg ns [max: 237275 ns, min 237275 ns]
        chromium-browse[1905605] lock 7ffe573f5358 contended 2 times, 634555 avg ns [max: 1024060 ns, min 245050 ns]
        chromium-browse[1905992] lock 7ffe573f5358 contended 1 times, 583965 avg ns [max: 583965 ns, min 583965 ns]
        chromium-browse[1905647] lock 7ffe573f5368 contended 8 times, 549800 avg ns [max: 775293 ns, min 258375 ns]
        JS Helper[2462] lock 55fe0cf82610 contended 2 times, 4694 avg ns [max: 8556 ns, min 832 ns]
        JS Helper[2461] lock 55fe0cf82694 contended 1 times, 257793 avg ns [max: 257793 ns, min 257793 ns]
        JS Helper[2456] lock 55fe0cf82690 contended 1 times, 677771 avg ns [max: 677771 ns, min 677771 ns]
        JS Helper[2463] lock 55fe0cf82610 contended 3 times, 5139 avg ns [max: 6873 ns, min 931 ns]
        gdbus[2980] lock 56227f6d0210 contended 2 times, 2465 avg ns [max: 4188 ns, min 742 ns]
        gnome-shell[2240] lock 55fe0cf82664 contended 5 times, 8036 avg ns [max: 13105 ns, min 401 ns]
        chromium-browse[1906308] lock 7ffe573f5358 contended 1 times, 210735 avg ns [max: 210735 ns, min 210735 ns]
        JS Helper[2463] lock 55fe0cf82694 contended 1 times, 251531 avg ns [max: 251531 ns, min 251531 ns]
        chromium-browse[1905801] lock 7ffe573f4f58 contended 4 times, 399927 avg ns [max: 476904 ns, min 178495 ns]
        [root@five ~]#
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lore.kernel.org/lkml/20200922200922.1306034-1-hagen@jauu.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69f48c70
    • H
      perf script: Autopep8 futex-contention · 2a684fcb
      Hagen Paul Pfeifer 提交于
      10 years leaves its mark! Python has evolved and so has its style guide.
      Even with vim it is getting hard to follow the no longer valid
      guidelines (spaces vs. tabs).
      
      Autopep8 this code to modernize it!
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Link: http://lore.kernel.org/lkml/20200921201928.799498-1-hagen@jauu.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a684fcb
    • J
      perf stat: Skip duration_time in setup_system_wide · 002a3d69
      Jin Yao 提交于
      Some metrics (such as DRAM_BW_Use) consists of uncore events and
      duration_time. For uncore events, counter->core.system_wide is true. But
      for duration_time, counter->core.system_wide is false so
      target.system_wide is set to false.
      
      Then 'enable_on_exec' is set in perf_event_attr of uncore event.  Kernel
      will return error when trying to open the uncore event.
      
      This patch skips the duration_time in setup_system_wide then
      target.system_wide will be set to true for the evlist of uncore events +
      duration_time.
      
      Before (tested on skylake desktop):
      
        # perf stat -M DRAM_BW_Use -- sleep 1
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (arb/event=0x84,umask=0x1/).
        /bin/dmesg | grep -i perf may provide additional information.
      
      After:
      
        # perf stat -M DRAM_BW_Use -- sleep 1
      
         Performance counter stats for 'system wide':
      
                      169      arb/event=0x84,umask=0x1/ #     0.00 DRAM_BW_Use
                   40,427      arb/event=0x81,umask=0x1/
            1,000,902,197 ns   duration_time
      
              1.000902197 seconds time elapsed
      
      Fixes: e3ba76de ("perf tools: Force uncore events to system wide monitoring")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200922015004.30114-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      002a3d69
    • L
      perf tsc: Support cap_user_time_short for event TIME_CONV · d110162c
      Leo Yan 提交于
      The synthesized event TIME_CONV doesn't contain the complete parameters
      for counters, this will lead to wrong conversion between counter cycles
      and timestamp.
      
      This patch extends event TIME_CONV to record flags 'cap_user_time_zero'
      which is used to indicate the counter parameters are valid or not, if
      not will directly return 0 for timestamp calculation.  And record the
      flag 'cap_user_time_short' and its relevant fields 'time_cycles' and
      'time_mask' for cycle calibration.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kemeng Shi <shikemeng@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Gasson <nick.gasson@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steve Maclean <steve.maclean@microsoft.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zou Wei <zou_wei@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200914115311.2201-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d110162c
    • L
      perf tsc: Calculate timestamp with cap_user_time_short · 78a93d4c
      Leo Yan 提交于
      The perf mmap'ed buffer contains the flag 'cap_user_time_short' and two
      extra fields 'time_cycles' and 'time_mask', perf tool needs to know them
      for handling the counter wrapping case.
      
      This patch is to reads out the relevant parameters from the head of the
      first mmap'ed page and stores into the structure 'perf_tsc_conversion',
      if the flag 'cap_user_time_short' has been set, it will firstly
      calibrate cycle value for timestamp calculation.
      
      Committer testing:
      
      Before/after:
      
        # perf test tsc
        70: Convert perf time to TSC                                        : Ok
        #
        # perf test -v tsc
        70: Convert perf time to TSC                                        :
        --- start ---
        test child forked, pid 11059
        mmap size 528384B
        1st event perf time 996384576521 tsc 3850532906613
        rdtsc          time 996384578455 tsc 3850532913950
        2nd event perf time 996384578845 tsc 3850532915428
        test child finished with 0
        ---- end ----
        Convert perf time to TSC: Ok
        #
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kemeng Shi <shikemeng@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Gasson <nick.gasson@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steve Maclean <steve.maclean@microsoft.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zou Wei <zou_wei@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200914115311.2201-4-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      78a93d4c
    • L
      perf tsc: Add rdtsc() for Arm64 · 4979e861
      Leo Yan 提交于
      The system register CNTVCT_EL0 can be used to retrieve the counter from
      user space.  Add rdtsc() for Arm64.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kemeng Shi <shikemeng@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Gasson <nick.gasson@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steve Maclean <steve.maclean@microsoft.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zou Wei <zou_wei@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200914115311.2201-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4979e861