1. 09 4月, 2021 6 次提交
  2. 29 1月, 2021 1 次提交
  3. 28 1月, 2021 1 次提交
  4. 12 1月, 2021 2 次提交
  5. 28 11月, 2020 5 次提交
    • M
      perf probe: Change function definition check due to broken DWARF · a9ffd048
      Masami Hiramatsu 提交于
      Since some gcc generates a broken DWARF which lacks DW_AT_declaration
      attribute from the subprogram DIE of function prototype.
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060)
      
      So, in addition to the DW_AT_declaration check, we also check the
      subprogram DIE has DW_AT_inline or actual entry pc.
      
      Committer testing:
      
        # cat /etc/fedora-release
        Fedora release 33 (Thirty Three)
        #
      
      Before:
      
        # perf test vfs_getname
        78: Use vfs_getname probe to get syscall args filenames             : FAILED!
        79: Check open filename arg using perf trace + vfs_getname          : FAILED!
        81: Add vfs_getname probe to get syscall args filenames             : FAILED!
        #
      
      After:
      
        # perf test vfs_getname
        78: Use vfs_getname probe to get syscall args filenames             : Ok
        79: Check open filename arg using perf trace + vfs_getname          : Ok
        81: Add vfs_getname probe to get syscall args filenames             : Ok
        #
      Reported-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/160645613571.2824037.7441351537890235895.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9ffd048
    • M
      perf probe: Fix to die_entrypc() returns error correctly · ab4200c1
      Masami Hiramatsu 提交于
      Fix die_entrypc() to return error correctly if the DIE has no
      DW_AT_ranges attribute. Since dwarf_ranges() will treat the case as an
      empty ranges and return 0, we have to check it by ourselves.
      
      Fixes: 91e2f539 ("perf probe: Fix to show function entry line as probe-able")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/160645612634.2824037.5284932731175079426.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab4200c1
    • N
      perf stat: Use proper cpu for shadow stats · c0ee1d5a
      Namhyung Kim 提交于
      Currently perf stat shows some metrics (like IPC) for defined events.
      But when no aggregation mode is used (-A option), it shows incorrect
      values since it used a value from a different cpu.
      
      Before:
      
        $ perf stat -aA -e cycles,instructions sleep 1
      
         Performance counter stats for 'system wide':
      
        CPU0      116,057,380      cycles
        CPU1       86,084,722      cycles
        CPU2       99,423,125      cycles
        CPU3       98,272,994      cycles
        CPU0       53,369,217      instructions      #    0.46  insn per cycle
        CPU1       33,378,058      instructions      #    0.29  insn per cycle
        CPU2       58,150,086      instructions      #    0.50  insn per cycle
        CPU3       40,029,703      instructions      #    0.34  insn per cycle
      
             1.001816971 seconds time elapsed
      
      So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
      but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
      
      After:
      
        $ perf stat -aA -e cycles,instructions sleep 1
      
         Performance counter stats for 'system wide':
      
        CPU0      109,621,384      cycles
        CPU1      159,026,454      cycles
        CPU2       99,460,366      cycles
        CPU3      124,144,142      cycles
        CPU0       44,396,706      instructions      #    0.41  insn per cycle
        CPU1      120,195,425      instructions      #    0.76  insn per cycle
        CPU2       44,763,978      instructions      #    0.45  insn per cycle
        CPU3       69,049,079      instructions      #    0.56  insn per cycle
      
             1.001910444 seconds time elapsed
      
      Fixes: 44d49a60 ("perf stat: Support metrics in --per-core/socket mode")
      Reported-by: NSam Xi <xyzsam@google.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20201127041404.390276-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c0ee1d5a
    • N
      perf record: Synthesize cgroup events only if needed · aa50d953
      Namhyung Kim 提交于
      It didn't check the tool->cgroup_events bit which is set when the
      --all-cgroups option is given.  Without it, samples will not have cgroup
      info so no reason to synthesize.
      
      We can check the PERF_RECORD_CGROUP records after running perf record
      *WITHOUT* the --all-cgroups option:
      
      Before:
      
        $ perf report -D | grep CGROUP
        0 0 0x8430 [0x38]: PERF_RECORD_CGROUP cgroup: 1 /
                CGROUP events:          1
                CGROUP events:          0
                CGROUP events:          0
      
      After:
      
        $ perf report -D | grep CGROUP
                CGROUP events:          0
                CGROUP events:          0
                CGROUP events:          0
      
      Committer testing:
      
      Before:
      
        # perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.208 MB perf.data (10003 samples) ]
        # perf report -D | grep "CGROUP events"
                  CGROUP events:        146
                  CGROUP events:          0
                  CGROUP events:          0
        #
      
      After:
      
        # perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.208 MB perf.data (10448 samples) ]
        # perf report -D | grep "CGROUP events"
                  CGROUP events:          0
                  CGROUP events:          0
                  CGROUP events:          0
        #
      
      With all-cgroups:
      
        # perf record --all-cgroups -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.374 MB perf.data (11526 samples) ]
        # perf report -D | grep "CGROUP events"
                  CGROUP events:        146
                  CGROUP events:          0
                  CGROUP events:          0
        #
      
      Fixes: 8fb4b679 ("perf record: Add --all-cgroups option")
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20201127054356.405481-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa50d953
    • A
      perf tools: Update copy of libbpf's hashmap.c · 3b13eaf0
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        7a078d2d ("libbpf, hashmap: Fix undefined behavior in hash_bits")
      
      That don't entail any changes in tools/perf.
      
      This addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
        diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
      
      Not a kernel ABI, its just that this uses the mechanism in place for
      checking kernel ABI files drift.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3b13eaf0
  6. 13 11月, 2020 1 次提交
    • A
      tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' · db1a8b97
      Arnaldo Carvalho de Melo 提交于
      To bring in the change made in this cset:
      
        4d6ffa27 ("x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S")
        6dcc5627 ("x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*")
      
      I needed to define SYM_FUNC_START_LOCAL() as SYM_L_GLOBAL as
      mem{cpy,set}_{orig,erms} are used by 'perf bench'.
      
      This silences these perf tools build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
        diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
        Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
        diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fangrui Song <maskray@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db1a8b97
  7. 03 11月, 2020 6 次提交
  8. 15 10月, 2020 6 次提交
  9. 14 10月, 2020 10 次提交
  10. 13 10月, 2020 2 次提交
    • N
      perf inject: Do not load map/dso when injecting build-id · e7b60c5a
      Namhyung Kim 提交于
      No need to load symbols in a DSO when injecting build-id.  I guess the
      reason was to check the DSO is a special file like anon files.  Use some
      helper functions in map.c to check them before reading build-id.  Also
      pass sample event's cpumode to a new build-id event.
      
      It brought a speedup in the benchmark of 25 -> 21 msec on my laptop.
      Also the memory usage (Max RSS) went down by ~200 KB.
      
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 21.389 msec (+- 0.138 msec)
          Average time per event: 2.097 usec (+- 0.014 usec)
          Average memory usage: 8225 KB (+- 0 KB)
      
      Committer notes:
      
      Before:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,020.56 msec task-clock:u              #    1.271 CPUs utilized            ( +-  0.74% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   123,354      page-faults:u             #    0.031 M/sec                    ( +-  0.81% )
             7,119,951,568      cycles:u                  #    1.771 GHz                      ( +-  1.74% )  (83.27%)
               230,086,969      stalled-cycles-frontend:u #    3.23% frontend cycles idle     ( +-  1.97% )  (83.41%)
             1,168,298,765      stalled-cycles-backend:u  #   16.41% backend cycles idle      ( +-  1.13% )  (83.44%)
            11,173,083,669      instructions:u            #    1.57  insn per cycle
                                                          #    0.10  stalled cycles per insn  ( +-  1.58% )  (83.31%)
             2,413,908,936      branches:u                #  600.392 M/sec                    ( +-  1.69% )  (83.26%)
                46,576,289      branch-misses:u           #    1.93% of all branches          ( +-  2.20% )  (83.31%)
      
                    3.1638 +- 0.0309 seconds time elapsed  ( +-  0.98% )
      
        $
      
      After:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  2,379.94 msec task-clock:u              #    1.473 CPUs utilized            ( +-  0.18% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                    62,584      page-faults:u             #    0.026 M/sec                    ( +-  0.07% )
             2,372,389,668      cycles:u                  #    0.997 GHz                      ( +-  0.29% )  (83.14%)
               106,937,862      stalled-cycles-frontend:u #    4.51% frontend cycles idle     ( +-  4.89% )  (83.20%)
               581,697,915      stalled-cycles-backend:u  #   24.52% backend cycles idle      ( +-  0.71% )  (83.47%)
             3,659,692,199      instructions:u            #    1.54  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.10% )  (83.63%)
               791,372,961      branches:u                #  332.518 M/sec                    ( +-  0.27% )  (83.39%)
                10,648,083      branch-misses:u           #    1.35% of all branches          ( +-  0.22% )  (83.16%)
      
                   1.61570 +- 0.00172 seconds time elapsed  ( +-  0.11% )
      
        $
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Original-patch-by: NStephane Eranian <eranian@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e7b60c5a
    • N
      perf bench: Add build-id injection benchmark · 0bf02a0d
      Namhyung Kim 提交于
      Sometimes I can see that 'perf record' piped with 'perf inject' take a
      long time processing build-ids.
      
      So introduce a inject-build-id benchmark to the internals benchmark
      suite to measure its overhead regularly.
      
      It runs the 'perf inject' command internally and feeds the given number
      of synthesized events (MMAP2 + SAMPLE basically).
      
        Usage: perf bench internals inject-build-id <options>
      
          -i, --iterations <n>  Number of iterations used to compute average (default: 100)
          -m, --nr-mmaps <n>    Number of mmap events for each iteration (default: 100)
          -n, --nr-samples <n>  Number of sample events per mmap event (default: 100)
          -v, --verbose         be more verbose (show iteration count, DSO name, etc)
      
      By default, it measures average processing time of 100 MMAP2 events
      and 10000 SAMPLE events.  Below is a result on my laptop.
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 25.789 msec (+- 0.202 msec)
          Average time per event: 2.528 usec (+- 0.020 usec)
          Average memory usage: 8411 KB (+- 7 KB)
      
      Committer testing:
      
        $ perf bench
        Usage:
        	perf bench [<common options>] <collection> <benchmark> [<options>]
      
                # List of all available benchmark collections:
      
                 sched: Scheduler and IPC benchmarks
               syscall: System call benchmarks
                   mem: Memory access benchmarks
                  numa: NUMA scheduling and MM benchmarks
                 futex: Futex stressing benchmarks
                 epoll: Epoll stressing benchmarks
             internals: Perf-internals benchmarks
                   all: All benchmarks
      
        $ perf bench internals
      
                # List of available benchmarks for collection 'internals':
      
            synthesize: Benchmark perf event synthesis
        kallsyms-parse: Benchmark kallsyms parsing
        inject-build-id: Benchmark build-id injection
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.202 msec (+- 0.059 msec)
          Average time per event: 1.392 usec (+- 0.006 usec)
          Average memory usage: 12650 KB (+- 10 KB)
          Average build-id-all injection took: 12.831 msec (+- 0.071 msec)
          Average time per event: 1.258 usec (+- 0.007 usec)
          Average memory usage: 11895 KB (+- 10 KB)
        $
      
        $ perf stat -r5 perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.380 msec (+- 0.056 msec)
          Average time per event: 1.410 usec (+- 0.006 usec)
          Average memory usage: 12608 KB (+- 11 KB)
          Average build-id-all injection took: 11.889 msec (+- 0.064 msec)
          Average time per event: 1.166 usec (+- 0.006 usec)
          Average memory usage: 11838 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.246 msec (+- 0.065 msec)
          Average time per event: 1.397 usec (+- 0.006 usec)
          Average memory usage: 12744 KB (+- 10 KB)
          Average build-id-all injection took: 12.019 msec (+- 0.066 msec)
          Average time per event: 1.178 usec (+- 0.006 usec)
          Average memory usage: 11963 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.321 msec (+- 0.067 msec)
          Average time per event: 1.404 usec (+- 0.007 usec)
          Average memory usage: 12690 KB (+- 10 KB)
          Average build-id-all injection took: 11.909 msec (+- 0.041 msec)
          Average time per event: 1.168 usec (+- 0.004 usec)
          Average memory usage: 11938 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.287 msec (+- 0.059 msec)
          Average time per event: 1.401 usec (+- 0.006 usec)
          Average memory usage: 12864 KB (+- 10 KB)
          Average build-id-all injection took: 11.862 msec (+- 0.058 msec)
          Average time per event: 1.163 usec (+- 0.006 usec)
          Average memory usage: 12103 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.402 msec (+- 0.053 msec)
          Average time per event: 1.412 usec (+- 0.005 usec)
          Average memory usage: 12876 KB (+- 10 KB)
          Average build-id-all injection took: 11.826 msec (+- 0.061 msec)
          Average time per event: 1.159 usec (+- 0.006 usec)
          Average memory usage: 12111 KB (+- 10 KB)
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,267.48 msec task-clock:u              #    1.502 CPUs utilized            ( +-  0.14% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   102,092      page-faults:u             #    0.024 M/sec                    ( +-  0.08% )
             3,894,589,578      cycles:u                  #    0.913 GHz                      ( +-  0.19% )  (83.49%)
               140,078,421      stalled-cycles-frontend:u #    3.60% frontend cycles idle     ( +-  0.77% )  (83.34%)
               948,581,189      stalled-cycles-backend:u  #   24.36% backend cycles idle      ( +-  0.46% )  (83.25%)
             5,835,587,719      instructions:u            #    1.50  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.21% )  (83.24%)
             1,267,423,636      branches:u                #  296.996 M/sec                    ( +-  0.22% )  (83.12%)
                17,484,290      branch-misses:u           #    1.38% of all branches          ( +-  0.12% )  (83.55%)
      
                   2.84176 +- 0.00222 seconds time elapsed  ( +-  0.08% )
      
        $
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bf02a0d