1. 07 11月, 2019 3 次提交
    • J
      perf report: Sort by sampled cycles percent per block for stdio · 6f7164fa
      Jin Yao 提交于
      It would be useful to support sorting for all blocks by the sampled
      cycles percent per block. This is useful to concentrate on the globally
      hottest blocks.
      
      This patch implements a new option "--total-cycles" which sorts all
      blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
      
       percent = block sampled cycles aggregation / total sampled cycles
      
      Note that, this patch only supports "--stdio" mode.
      
      For example,
      
        # perf record -b ./div
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 2M of event 'cycles'
        # Event count (approx.): 2753248
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                             [Program Block Range]      Shared Object
        # ...............  ..............  ...........  ..........  ................................................  .................
        #
                   26.04%            2.8M        0.40%          18                            [div.c:42 -> div.c:39]                div
                   15.17%            1.2M        0.16%           7                [random_r.c:357 -> random_r.c:380]       libc-2.27.so
                    5.11%          402.0K        0.04%           2                            [div.c:27 -> div.c:28]                div
                    4.87%          381.6K        0.04%           2                    [random.c:288 -> random.c:291]       libc-2.27.so
                    4.53%          381.0K        0.04%           2                            [div.c:40 -> div.c:40]                div
                    3.85%          300.9K        0.02%           1                            [div.c:22 -> div.c:25]                div
                    3.08%          241.1K        0.02%           1                          [rand.c:26 -> rand.c:27]       libc-2.27.so
                    3.06%          240.0K        0.02%           1                    [random.c:291 -> random.c:291]       libc-2.27.so
                    2.78%          215.7K        0.02%           1                    [random.c:298 -> random.c:298]       libc-2.27.so
                    2.52%          198.3K        0.02%           1                    [random.c:293 -> random.c:293]       libc-2.27.so
                    2.36%          184.8K        0.02%           1                          [rand.c:28 -> rand.c:28]       libc-2.27.so
                    2.33%          180.5K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.28%          176.7K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.20%          168.8K        0.02%           1                        [rand@plt+0 -> rand@plt+0]                div
                    1.98%          158.2K        0.02%           1                [random_r.c:388 -> random_r.c:388]       libc-2.27.so
                    1.57%          123.3K        0.02%           1                            [div.c:42 -> div.c:44]                div
                    1.44%          116.0K        0.42%          19                [random_r.c:357 -> random_r.c:394]       libc-2.27.so
                    0.25%          182.5K        0.02%           1                [random_r.c:388 -> random_r.c:391]       libc-2.27.so
                    0.00%              48        1.07%          48        [x86_pmu_enable+284 -> x86_pmu_enable+298]  [kernel.kallsyms]
                    0.00%              74        1.64%          74             [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92]  [kernel.kallsyms]
                    0.00%              73        1.62%          73                         [vm_mmap+0 -> vm_mmap+48]  [kernel.kallsyms]
                    0.00%              63        0.69%          31                       [up_write+0 -> up_write+34]  [kernel.kallsyms]
                    0.00%              13        0.29%          13      [setup_arg_pages+396 -> setup_arg_pages+413]  [kernel.kallsyms]
                    0.00%               3        0.07%           3      [setup_arg_pages+418 -> setup_arg_pages+450]  [kernel.kallsyms]
                    0.00%             616        6.84%         308   [security_mmap_file+0 -> security_mmap_file+72]  [kernel.kallsyms]
                    0.00%              23        0.51%          23  [security_mmap_file+77 -> security_mmap_file+87]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                  [sched_clock+0 -> sched_clock+4]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                 [sched_clock+9 -> sched_clock+12]  [kernel.kallsyms]
                    0.00%               1        0.02%           1                [rcu_nmi_exit+0 -> rcu_nmi_exit+9]  [kernel.kallsyms]
      
      Committer testing:
      
      This should provide material for hours of endless joy, both from looking
      for suspicious things in the implementation of this patch, such as the
      top one:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    2.17%            1.7M        0.08%         607   [compiler.h:199 -> common.c:221]              [kernel.vmlinux]
      
      As well from things that look legit:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    0.16%          123.0K        0.60%        4.7K   [nospec-branch.h:265 -> nospec-branch.h:278]  [kernel.vmlinux]
      
      :-)
      
      Very short system wide taken branches session:
      
        # perf record -h -b
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -b, --branch-any      sample any taken branches
      
        #
        # perf record -b
        ^C[ perf record: Woken up 595 times to write data ]
        [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
      
        #
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        #
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
                    0.56%          541.8K        0.09%         672                                        [compiler.h:199 -> common.c:300]      [kernel.vmlinux]
                    0.39%          293.2K        0.01%         104                                    [list_debug.c:43 -> list_debug.c:61]      [kernel.vmlinux]
                    0.36%          278.6K        0.03%         272                                    [entry_64.S:1289 -> entry_64.S:1308]      [kernel.vmlinux]
                    0.30%          260.8K        0.07%         564                              [clear_page_64.S:47 -> clear_page_64.S:50]      [kernel.vmlinux]
                    0.28%          215.3K        0.05%         369                                            [traps.c:623 -> traps.c:628]      [kernel.vmlinux]
                    0.23%          178.1K        0.04%         278                                      [entry_64.S:271 -> entry_64.S:275]      [kernel.vmlinux]
                    0.20%          152.6K        0.09%         706                                      [paravirt.c:177 -> paravirt.c:179]      [kernel.vmlinux]
                    0.20%          155.8K        0.05%         373                                      [entry_64.S:153 -> entry_64.S:175]      [kernel.vmlinux]
                    0.18%          136.6K        0.03%         222                                                [msr.h:105 -> msr.h:166]      [kernel.vmlinux]
                    0.16%          123.0K        0.60%        4.7K                            [nospec-branch.h:265 -> nospec-branch.h:278]      [kernel.vmlinux]
                    0.16%          118.3K        0.01%          44                                      [entry_64.S:632 -> entry_64.S:657]      [kernel.vmlinux]
                    0.14%          104.5K        0.00%          28                                          [rwsem.c:1541 -> rwsem.c:1544]      [kernel.vmlinux]
                    0.13%           99.2K        0.01%          53                                      [spinlock.c:150 -> spinlock.c:152]      [kernel.vmlinux]
                    0.13%           95.5K        0.00%          35                                              [swap.c:456 -> swap.c:471]      [kernel.vmlinux]
                    0.12%           96.2K        0.05%         407                              [copy_user_64.S:175 -> copy_user_64.S:209]      [kernel.vmlinux]
                    0.11%           85.9K        0.00%          31                                        [swap.c:400 -> page-flags.h:188]      [kernel.vmlinux]
                    0.10%           73.0K        0.01%          52                                          [paravirt.h:763 -> list.h:131]      [kernel.vmlinux]
                    0.07%           56.2K        0.03%         214                                      [filemap.c:1524 -> filemap.c:1557]      [kernel.vmlinux]
                    0.07%           54.2K        0.02%         145                                        [memory.c:1032 -> memory.c:1049]      [kernel.vmlinux]
                    0.07%           50.3K        0.00%          39                                            [mmzone.c:49 -> mmzone.c:69]      [kernel.vmlinux]
                    0.06%           48.3K        0.01%          40                                   [paravirt.h:768 -> page_alloc.c:3304]      [kernel.vmlinux]
                    0.06%           46.7K        0.02%         155                                        [memory.c:1032 -> memory.c:1056]      [kernel.vmlinux]
                    0.06%           46.9K        0.01%         103                                              [swap.c:867 -> swap.c:902]      [kernel.vmlinux]
                    0.06%           47.8K        0.00%          34                                    [entry_64.S:1201 -> entry_64.S:1202]      [kernel.vmlinux]
      
       -----------------------------------------------------------
      
       v7:
       ---
       Use use_browser in report__browse_block_hists for supporting
       stdio and potential tui mode.
      
       v6:
       ---
       Create report__browse_block_hists in block-info.c (codes are
       moved from builtin-report.c). It's called from
       perf_evlist__tty_browse_hists.
      
       v5:
       ---
       1. Move all block functions to block-info.c
      
       2. Move the code of setting ms in block hist_entry to
          other patch.
      
       v4:
       ---
       1. Use new option '--total-cycles' to replace
          '-s total_cycles' in v3.
      
       2. Move block info collection out of block info
          printing.
      
       v3:
       ---
       1. Use common function block_info__process_sym to
          process the blocks per symbol.
      
       2. Remove the nasty hack for skipping calculation
          of column length
      
       3. Some minor cleanup
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f7164fa
    • J
      perf hist: Count the total cycles of all samples · 7841f40a
      Jin Yao 提交于
      We can get the per sample cycles by hist__account_cycles(). It's also
      useful to know the total cycles of all samples in order to get the
      cycles coverage for a single program block in further. For example:
      
        coverage = per block sampled cycles / total sampled cycles
      
      This patch creates a new argument 'total_cycles' in hist__account_cycles(),
      which will be added with the cycles of each sample.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7841f40a
    • A
      perf maps: Add for_each_entry()/_safe() iterators · 8efc4f05
      Arnaldo Carvalho de Melo 提交于
      To reduce boilerplate, provide a more compact form using an idiom
      present in other trees of data structures.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8efc4f05
  2. 15 10月, 2019 1 次提交
  3. 21 9月, 2019 1 次提交
  4. 20 9月, 2019 1 次提交
  5. 01 9月, 2019 4 次提交
  6. 29 8月, 2019 2 次提交
  7. 26 8月, 2019 1 次提交
  8. 20 8月, 2019 1 次提交
    • A
      perf report: Prefer DWARF callstacks to LBR ones when captured both · 10ccbc1c
      Alexey Budankov 提交于
      Display DWARF based callchains when the perf.data file contains raw thread
      stack data as LBR callstack data.
      
      Commiter testing:
      
      This changes the output from the branch stack based one, i.e. without
      this patch, for the same file as in the previous csets:
      
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 13
        #
        # Overhead  Command  Source Shared Object  Source Symbol                Target Symbol                              Basic Block Cycles
        # ........  .......  ....................  ...........................  .........................................  ..................
        #
             7.69%  ls       libpthread-2.29.so    [.] _init                    [.] __pthread_initialize_minimal_internal  6827
             7.69%  ls       ld-2.29.so            [k] _start                   [k] _dl_start                              -
             7.69%  ls       ld-2.29.so            [.] _dl_start_user           [.] _dl_init                               -24790
             7.69%  ls       ld-2.29.so            [k] _dl_start                [k] _dl_sysdep_start                       278
             7.69%  ls       ld-2.29.so            [k] dl_main                  [k] _dl_map_object_deps                    15581
             7.69%  ls       ld-2.29.so            [k] open_verify.constprop.0  [k] lseek64                                4228
             7.69%  ls       ld-2.29.so            [k] _dl_map_object           [k] open_verify.constprop.0                55
             7.69%  ls       ld-2.29.so            [k] openaux                  [k] _dl_map_object                         67
             7.69%  ls       ld-2.29.so            [k] _dl_map_object_deps      [k] 0x00007f441b57c090                     112
             7.69%  ls       ld-2.29.so            [.] call_init.part.0         [.] _init                                  334
             7.69%  ls       ld-2.29.so            [.] _dl_init                 [.] call_init.part.0                       383
             7.69%  ls       ld-2.29.so            [k] _dl_sysdep_start         [k] dl_main                                45
             7.69%  ls       ld-2.29.so            [k] _dl_catch_exception      [k] openaux                                116
      
        #
        # (Tip: For memory address profiling, try: perf mem record / perf mem report)
        #
      
      To the one that shows call chains:
      
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 10  of event 'cycles'
        # Event count (approx.): 3204047
        #
        # Children      Self  Command  Shared Object       Symbol
        # ........  ........  .......  ..................  .........................................
        #
            55.01%     0.00%  ls       [kernel.vmlinux]    [k] entry_SYSCALL_64_after_hwframe
                    |
                    ---entry_SYSCALL_64_after_hwframe
                       do_syscall_64
                       |
                        --16.01%--__x64_sys_execve
                                  __do_execve_file.isra.0
                                  search_binary_handler
                                  load_elf_binary
                                  elf_map
                                  vm_mmap_pgoff
                                  do_mmap
                                  mmap_region
                                  perf_event_mmap
                                  perf_iterate_sb
                                  perf_iterate_ctx
                                  perf_event_mmap_output
                                  perf_output_copy
                                  memcpy_erms
      
            55.01%    39.00%  ls       [kernel.vmlinux]    [k] do_syscall_64
                    |
                    |--39.00%--0xffffffffffffffff
                    |          _dl_map_object
                    |          open_verify.constprop.0
                    |          __lseek64 (inlined)
                    |          entry_SYSCALL_64_after_hwframe
                    |          do_syscall_64
                    |
                     --16.01%--do_syscall_64
                               __x64_sys_execve
                               __do_execve_file.isra.0
                               search_binary_handler
                               load_elf_binary
                               elf_map
                               vm_mmap_pgoff
                               do_mmap
                               mmap_region
                               perf_event_mmap
                               perf_iterate_sb
                               perf_iterate_ctx
                               perf_event_mmap_output
                               perf_output_copy
                               memcpy_erms
      
            42.95%    42.95%  ls       libpthread-2.29.so  [.] __pthread_initialize_minimal_internal
                    |
                    ---_init
                       __pthread_initialize_minimal_internal
      
            42.95%     0.00%  ls       libpthread-2.29.so  [.] _init
                    |
                    ---_init
                       __pthread_initialize_minimal_internal
      
        <SNIP>
      
        #
        # (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
        #
        #
      
      The branch stack view be explicitely selected using:
      
        # perf report -h branch-stack
      
         Usage: perf report [<options>]
      
            -b, --branch-stack    use branch records for per branch histogram filling
      
        #
      
      I.e. after this patch:
      
        # perf report -b --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 13
        #
        # Overhead  Command  Source Shared Object  Source Symbol                Target Symbol                              Basic Block Cycles
        # ........  .......  ....................  ...........................  .........................................  ..................
        #
             7.69%  ls       libpthread-2.29.so    [.] _init                    [.] __pthread_initialize_minimal_internal  6827
             7.69%  ls       ld-2.29.so            [k] _start                   [k] _dl_start                              -
             7.69%  ls       ld-2.29.so            [.] _dl_start_user           [.] _dl_init                               -24790
             7.69%  ls       ld-2.29.so            [k] _dl_start                [k] _dl_sysdep_start                       278
             7.69%  ls       ld-2.29.so            [k] dl_main                  [k] _dl_map_object_deps                    15581
             7.69%  ls       ld-2.29.so            [k] open_verify.constprop.0  [k] lseek64                                4228
             7.69%  ls       ld-2.29.so            [k] _dl_map_object           [k] open_verify.constprop.0                55
             7.69%  ls       ld-2.29.so            [k] openaux                  [k] _dl_map_object                         67
             7.69%  ls       ld-2.29.so            [k] _dl_map_object_deps      [k] 0x00007f441b57c090                     112
             7.69%  ls       ld-2.29.so            [.] call_init.part.0         [.] _init                                  334
             7.69%  ls       ld-2.29.so            [.] _dl_init                 [.] call_init.part.0                       383
             7.69%  ls       ld-2.29.so            [k] _dl_sysdep_start         [k] dl_main                                45
             7.69%  ls       ld-2.29.so            [k] _dl_catch_exception      [k] openaux                                116
      
        #
        # (Tip: Show current config key-value pairs: perf config --list)
        #
        #
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/ccbd9583-82f4-dec5-7e84-64bf56e351fb@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      10ccbc1c
  9. 16 8月, 2019 1 次提交
    • A
      perf report: Add --switch-on/--switch-off events · ef4b1a53
      Arnaldo Carvalho de Melo 提交于
      Since 'perf top' shares the histogram browser with 'perf report', then
      the same explanation in the previous cset applies.
      
      An additional example uses a pair of SDT events available for systemtap:
      
        # perf probe --exec=/usr/bin/stap '%*:*'
        Added new events:
          sdt_stap:benchmark__thread__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark   (on %* in /usr/bin/stap)
          sdt_stap:benchmark__thread__end (on %* in /usr/bin/stap)
          sdt_stap:pass6__start (on %* in /usr/bin/stap)
          sdt_stap:pass6__end  (on %* in /usr/bin/stap)
          sdt_stap:pass5__start (on %* in /usr/bin/stap)
          sdt_stap:pass5__end  (on %* in /usr/bin/stap)
          sdt_stap:pass0__start (on %* in /usr/bin/stap)
          sdt_stap:pass0__end  (on %* in /usr/bin/stap)
          sdt_stap:pass1a__start (on %* in /usr/bin/stap)
          sdt_stap:pass1b__start (on %* in /usr/bin/stap)
          sdt_stap:pass1__end  (on %* in /usr/bin/stap)
          sdt_stap:pass2__start (on %* in /usr/bin/stap)
          sdt_stap:pass2__end  (on %* in /usr/bin/stap)
          sdt_stap:pass3__start (on %* in /usr/bin/stap)
          sdt_stap:pass3__end  (on %* in /usr/bin/stap)
          sdt_stap:pass4__start (on %* in /usr/bin/stap)
          sdt_stap:pass4__end  (on %* in /usr/bin/stap)
          sdt_stap:benchmark__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark__end (on %* in /usr/bin/stap)
          sdt_stap:cache__get  (on %* in /usr/bin/stap)
          sdt_stap:cache__clean (on %* in /usr/bin/stap)
          sdt_stap:cache__add__module (on %* in /usr/bin/stap)
          sdt_stap:cache__add__source (on %* in /usr/bin/stap)
          sdt_stap:stap_system__complete (on %* in /usr/bin/stap)
          sdt_stap:stap_system__start (on %* in /usr/bin/stap)
          sdt_stap:stap_system__spawn (on %* in /usr/bin/stap)
          sdt_stap:stap_system__fork (on %* in /usr/bin/stap)
          sdt_stap:intern_string (on %* in /usr/bin/stap)
          sdt_stap:client__start (on %* in /usr/bin/stap)
          sdt_stap:client__end (on %* in /usr/bin/stap)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e sdt_stap:client__end -aR sleep 1
      
        #
      
      From these we're use the two below to run systemtap's test suite:
      
        # perf record -e sdt_stap:pass2__*,cycles:P make installcheck > /dev/null
        ^C[ perf record: Woken up 8 times to write data ]
        [ perf record: Captured and wrote 2.691 MB perf.data (39638 samples) ]
        Terminated
        # perf script | grep sdt_stap
                    stap 28979 [000] 19424.302660: sdt_stap:pass2__start: (561b9a537de3) arg1=140730364262544
                    stap 28979 [000] 19424.333083:   sdt_stap:pass2__end: (561b9a53a9e1) arg1=140730364262544
                    stap 29045 [006] 19424.933460: sdt_stap:pass2__start: (563edddcede3) arg1=140722674883152
                    stap 29045 [006] 19424.963794:   sdt_stap:pass2__end: (563edddd19e1) arg1=140722674883152
        # perf script | grep cycles |  wc -l
        39634
        #
      
      Looking at the whole perf.data file:
      
        [root@quaco testsuite]# perf report | grep cycles:P -A25
        # Samples: 39K of event 'cycles:P'
        # Event count (approx.): 34044267368
        #
        # Overhead  Command  Shared Object         Symbol
        # ........  .......  ....................  ................................
        #
             3.50%  cc1      cc1                   [.] ht_lookup_with_hash
             3.04%  cc1      cc1                   [.] _cpp_lex_token
             2.11%  cc1      cc1                   [.] ggc_internal_alloc
             1.83%  cc1      cc1                   [.] cpp_get_token_with_location
             1.68%  cc1      libc-2.29.so          [.] _int_malloc
             1.41%  cc1      cc1                   [.] linemap_position_for_column
             1.25%  cc1      cc1                   [.] ggc_internal_cleared_alloc
             1.20%  cc1      cc1                   [.] c_lex_with_flags
             1.18%  cc1      cc1                   [.] get_combined_adhoc_loc
             1.05%  cc1      libc-2.29.so          [.] malloc
             1.01%  cc1      libc-2.29.so          [.] _int_free
             0.96%  stap     stap                  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, stringtable_hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > > >
             0.78%  stap     stap                  [.] lexer::scan
             0.74%  cc1      cc1                   [.] _cpp_lex_direct
             0.70%  cc1      cc1                   [.] pop_scope
             0.70%  cc1      cc1                   [.] c_parser_declspecs
             0.69%  stap     libc-2.29.so          [.] _int_malloc
             0.68%  cc1      cc1                   [.] htab_find_slot
             0.68%  cc1      [kernel.vmlinux]      [k] prepare_exit_to_usermode
             0.64%  cc1      [kernel.vmlinux]      [k] clear_page_erms
        [root@quaco testsuite]#
      
      And now only what happens in slices demarcated by those start/end SDT
      events:
      
        [root@quaco testsuite]# perf report --switch-on=sdt_stap:pass2__start --switch-off=sdt_stap:pass2__end | grep cycles:P -A100
        # Samples: 240  of event 'cycles:P'
        # Event count (approx.): 206491934
        #
        # Overhead  Command  Shared Object        Symbol
        # ........  .......  ...................  ................................................
        #
            38.99%  stap     stap                 [.] systemtap_session::register_library_aliases
            19.47%  stap     stap                 [.] match_key::operator<
            15.01%  stap     libc-2.29.so         [.] __memcmp_avx2_movbe
             5.19%  stap     libc-2.29.so         [.] _int_malloc
             2.50%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_insert_and_rebalance
             2.30%  stap     stap                 [.] match_node::build_no_more
             2.07%  stap     libc-2.29.so         [.] malloc
             1.66%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::find
             1.66%  stap     stap                 [.] match_node::bind
             1.58%  stap     [kernel.vmlinux]     [k] prepare_exit_to_usermode
             1.17%  stap     [kernel.vmlinux]     [k] native_irq_return_iret
             0.87%  stap     stap                 [.] 0x0000000000032ec4
             0.77%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_increment
             0.47%  stap     stap                 [.] std::vector<derived_probe_builder*, std::allocator<derived_probe_builder*> >::_M_realloc_insert<derived_probe_builder* const&>
             0.47%  stap     [kernel.vmlinux]     [k] get_page_from_freelist
             0.47%  stap     [kernel.vmlinux]     [k] swapgs_restore_regs_and_return_to_usermode
             0.47%  stap     [kernel.vmlinux]     [k] do_user_addr_fault
             0.46%  stap     [kernel.vmlinux]     [k] __pagevec_lru_add_fn
             0.46%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::_M_emplace_unique<std::pair<match_key, match_node*> >
             0.42%  stap     libstdc++.so.6.0.26  [.] 0x00000000000c18fa
             0.40%  stap     [kernel.vmlinux]     [k] interrupt_entry
             0.40%  stap     [kernel.vmlinux]     [k] update_load_avg
             0.40%  stap     [kernel.vmlinux]     [k] __intel_pmu_disable_all
             0.40%  stap     [kernel.vmlinux]     [k] clear_page_erms
             0.39%  stap     [kernel.vmlinux]     [k] __mod_node_page_state
             0.39%  stap     [kernel.vmlinux]     [k] error_entry
             0.39%  stap     [kernel.vmlinux]     [k] sync_regs
             0.38%  stap     [kernel.vmlinux]     [k] __handle_mm_fault
             0.38%  stap     stap                 [.] derive_probes
      
        #
        # (Tip: System-wide collection from all CPUs: perf record -a)
        #
        [root@quaco testsuite]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: William Cohen <wcohen@redhat.com>
      Link: https://lkml.kernel.org/n/tip-408hvumcnyn93a0auihnawew@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef4b1a53
  10. 30 7月, 2019 3 次提交
  11. 09 7月, 2019 2 次提交
  12. 26 6月, 2019 2 次提交
  13. 11 6月, 2019 1 次提交
  14. 16 5月, 2019 2 次提交
    • A
      perf report: Implement perf.data record decompression · cb62c6f1
      Alexey Budankov 提交于
      zstd_init(, comp_level = 0) initializes decompression part of API only
      hat now consists of zstd_decompress_stream() function.
      
      The perf.data PERF_RECORD_COMPRESSED records are decompressed using
      zstd_decompress_stream() function into a linked list of mmaped memory
      regions of mmap_comp_len size (struct decomp).
      
      After decompression of one COMPRESSED record its content is iterated and
      fetched for usual processing. The mmaped memory regions with
      decompressed events are kept in the linked list till the tool process
      termination.
      
      When dumping raw records (e.g., perf report -D --header) file offsets of
      events from compressed records are printed as zero.
      
      Committer notes:
      
      Since now we have support for processing PERF_RECORD_COMPRESSED, we see
      none, in raw form, like we saw in the previous patch commiter notes,
      they were decompressed into the usual PERF_RECORD_{FORK,MMAP,COMM,etc}
      records, we only see the stats for those PERF_RECORD_COMPRESSED events,
      and since I used the file generated in the commiter notes for the
      previous patch, there they are, 2 compressed records:
      
        $ perf report --header-only | grep cmdline
        # cmdline : /home/acme/bin/perf record -z2 sleep 1
        $ perf report -D | grep COMPRESS
              COMPRESSED events:          2
              COMPRESSED events:          0
        $ perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 15  of event 'cycles:u'
        # Event count (approx.): 962227
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ...........................
        #
            46.99%  sleep    libc-2.28.so      [.] _dl_addr
            29.24%  sleep    [unknown]         [k] 0xffffffffaea00a67
            16.45%  sleep    libc-2.28.so      [.] __GI__IO_un_link.part.1
             5.92%  sleep    ld-2.28.so        [.] _dl_setup_hash
             1.40%  sleep    libc-2.28.so      [.] __nanosleep
             0.00%  sleep    [unknown]         [k] 0xffffffffaea00163
      
        #
        # (Tip: To see callchains in a more compact form: perf report -g folded)
        #
        $
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/304b0a59-942c-3fe1-da02-aa749f87108b@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cb62c6f1
    • J
      perf annotate: Remove hist__account_cycles() from callback · bdd1666b
      Jin Yao 提交于
      The hist__account_cycles() function is executed when the
      hist_iter__branch_callback() is called.
      
      But it looks it's not necessary.  In hist__account_cycles, it already
      walks on all branch entries.
      
      This patch moves the hist__account_cycles out of callback, now the data
      processing is much faster than before.
      
      Previous code has an issue that the ch[offset].num++ (in
      __symbol__account_cycles) is executed repeatedly since
      hist__account_cycles is called in each hist_iter__branch_callback, so
      the counting of ch[offset].num is not correct (too big).
      
      With this patch, the issue is fixed. And we don't need the code of
      "ch->reset >= ch->num / 2" to check if there are too many overlaps (in
      annotation__count_and_fill), otherwise some data would be hidden.
      
      Now, we can try, for example:
      
        perf record -b ...
        perf annotate or perf report -s symbol
      
      The before/after output should be no change.
      
       v3:
       ---
       Fix the crash in stdio mode.
       Like previous code, it needs the checking of ui__has_annotation()
       before hist__account_cycles()
      
       v2:
       ---
       1. Cover the similar perf report
       2. Remove the checking code "ch->reset >= ch->num / 2"
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1552684577-29041-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bdd1666b
  15. 20 3月, 2019 1 次提交
  16. 12 3月, 2019 1 次提交
    • A
      perf report: Implement browsing of individual samples · 4968ac8f
      Andi Kleen 提交于
      Now 'perf report' can show whole time periods with 'perf script', but
      the user still has to find individual samples of interest manually.
      
      It would be expensive and complicated to search for the right samples in
      the whole perf file. Typically users only need to look at a small number
      of samples for useful analysis.
      
      Also the full scripts tend to show samples of all CPUs and all threads
      mixed up, which can be very confusing on larger systems.
      
      Add a new --samples option to save a small random number of samples per
      hist entry.
      
      Use a reservoir sample technique to select a representatve number of
      samples.
      
      Then allow browsing the samples using 'perf script' as part of the hist
      entry context menu. This automatically adds the right filters, so only
      the thread or cpu of the sample is displayed. Then we use less' search
      functionality to directly jump the to the time stamp of the selected
      sample.
      
      It uses different menus for assembler and source display.  Assembler
      needs xed installed and source needs debuginfo.
      
      Currently it only supports as many samples as fit on the screen due to
      some limitations in the slang ui code.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190311174605.GA29294@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4968ac8f
  17. 11 3月, 2019 2 次提交
  18. 01 3月, 2019 1 次提交
    • J
      perf time-utils: Refactor time range parsing code · 284c4e18
      Jin Yao 提交于
      Jiri points out that we don't need any time checking and time string
      parsing if the --time option is not set. That makes sense.
      
      This patch refactors the time range parsing code, move the duplicated
      code from perf report and perf script to time_utils and check if --time
      option is set before parsing the time string. This patch is no logic
      change expected. So the usage of --time is same as before.
      
      For example:
      
      Select the first and second 10% time slices:
        perf report --time 10%/1,10%/2
        perf script --time 10%/1,10%/2
      
      Select the slices from 0% to 10% and from 30% to 40%:
        perf report --time 0%-10%,30%-40%
        perf script --time 0%-10%,30%-40%
      
      Select the time slices from timestamp 3971 to 3973
        perf report --time 3971,3973
        perf script --time 3971,3973
      
      Committer testing:
      
      Using the above examples, check before and after to see if it remains
      the same:
      
        $ perf record -F 10000 -- find . -name "*.[ch]" -exec cat {} + > /dev/null
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 1.626 MB perf.data (42392 samples) ]
        $
        $ perf report --time 10%/1,10%/2 > /tmp/report.before.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.before.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.before.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.before.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.before.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.before.3
      
      For example, the 3rd test produces this slice:
      
        $ cat /tmp/script.before.3
              cat  3147 180457.375844:   2143 cycles:uppp:      7f79362590d9 cfree@GLIBC_2.2.5+0x9 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.375986:   2245 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376012:   2164 cycles:uppp:      7f7936257430 _int_malloc+0x8c0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376140:   2921 cycles:uppp:      558b70f3a554 [unknown] (/usr/bin/cat)
              cat  3147 180457.376296:   2844 cycles:uppp:      7f7936258abe malloc+0x4e (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376431:   2717 cycles:uppp:      558b70f3b0ca [unknown] (/usr/bin/cat)
              cat  3147 180457.376667:   2630 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376795:   2442 cycles:uppp:      7f79362bff55 read+0x15 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376927:   2376 cycles:uppp:  ffffffff9aa00163 [unknown] ([unknown])
              cat  3147 180457.376954:   2307 cycles:uppp:      7f7936257438 _int_malloc+0x8c8 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377116:   3091 cycles:uppp:      7f7936258a70 malloc+0x0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377362:   2945 cycles:uppp:      558b70f3a3b0 [unknown] (/usr/bin/cat)
              cat  3147 180457.377517:   2727 cycles:uppp:      558b70f3a9aa [unknown] (/usr/bin/cat)
        $
      
      Install 'coreutils-debuginfo' to see cat's guts (symbols), but then, the
      above chunk translates into this 'perf report' output:
      
        $ cat /tmp/report.before.3
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles:uppp' (time slices: 180457.375844,180457.377717)
        # Event count (approx.): 33552
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ......................
        #
            17.69%  cat      libc-2.28.so      [.] malloc
            14.53%  cat      cat               [.] 0x000000000000586e
            13.33%  cat      libc-2.28.so      [.] _int_malloc
             8.78%  cat      cat               [.] 0x00000000000023b0
             8.71%  cat      cat               [.] 0x0000000000002554
             8.13%  cat      cat               [.] 0x00000000000029aa
             8.10%  cat      cat               [.] 0x00000000000030ca
             7.28%  cat      libc-2.28.so      [.] read
             7.08%  cat      [unknown]         [k] 0xffffffff9aa00163
             6.39%  cat      libc-2.28.so      [.] cfree@GLIBC_2.2.5
      
        #
        # (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        #
        $
      
      Now lets see after applying this patch, nothing should change:
      
        $ perf report --time 10%/1,10%/2 > /tmp/report.after.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.after.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.after.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.after.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.after.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.after.3
        $ diff -u /tmp/report.before.1 /tmp/report.after.1
        $ diff -u /tmp/script.before.1 /tmp/script.after.1
        $ diff -u /tmp/report.before.2 /tmp/report.after.2
        --- /tmp/report.before.2	2019-03-01 11:01:53.526094883 -0300
        +++ /tmp/report.after.2	2019-03-01 11:09:18.231770467 -0300
        @@ -352,5 +352,5 @@
      
         #
        -# (Tip: Generate a script for your data: perf script -g <lang>)
        +# (Tip: Treat branches as callchains: perf report --branch-history)
         #
        $ diff -u /tmp/script.before.2 /tmp/script.after.2
        $ diff -u /tmp/report.before.3 /tmp/report.after.3
        --- /tmp/report.before.3	2019-03-01 11:03:08.890045588 -0300
        +++ /tmp/report.after.3	2019-03-01 11:09:40.660224002 -0300
        @@ -22,5 +22,5 @@
      
         #
        -# (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        +# (Tip: List events using substring match: perf list <keyword>)
         #
        $ diff -u /tmp/script.before.3 /tmp/script.after.3
        $
      
      Cool, just the 'perf report' tips changed, QED.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1551435186-6008-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      284c4e18
  19. 23 2月, 2019 1 次提交
    • J
      perf data: Add global path holder · 2d4f2799
      Jiri Olsa 提交于
      Add a 'path' member to 'struct perf_data'. It will keep the configured
      path for the data (const char *). The path in struct perf_data_file is
      now dynamically allocated (duped) from it.
      
      This scheme is useful/used in following patches where struct
      perf_data::path holds the 'configure' directory path and struct
      perf_data_file::path holds the allocated path for specific files.
      
      Also it actually makes the code little simpler.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org
      [ Fixup data-convert-bt.c missing conversion ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d4f2799
  20. 06 2月, 2019 2 次提交
  21. 25 1月, 2019 1 次提交
  22. 22 1月, 2019 1 次提交
    • R
      perf tools: Replace automatic const char[] variables by statics · 49b8e2be
      Rasmus Villemoes 提交于
      An automatic const char[] variable gets initialized at runtime, just
      like any other automatic variable. For long strings, that uses a lot of
      stack and wastes time building the string; e.g. for the "No %s
      allocation events..." case one has:
      
        444516:       48 b8 4e 6f 20 25 73 20 61 6c   movabs $0x6c61207325206f4e,%rax # "No %s al"
        ...
        444674:       48 89 45 80                     mov    %rax,-0x80(%rbp)
        444678:       48 b8 6c 6f 63 61 74 69 6f 6e   movabs $0x6e6f697461636f6c,%rax # "location"
        444682:       48 89 45 88                     mov    %rax,-0x78(%rbp)
        444686:       48 b8 20 65 76 65 6e 74 73 20   movabs $0x2073746e65766520,%rax # " events "
        444690:       66 44 89 55 c4                  mov    %r10w,-0x3c(%rbp)
        444695:       48 89 45 90                     mov    %rax,-0x70(%rbp)
        444699:       48 b8 66 6f 75 6e 64 2e 20 20   movabs $0x20202e646e756f66,%rax
      
      Make them all static so that the compiler just references objects in .rodata.
      
      Committer testing:
      
      Ok, using dwarves's codiff tool:
      
          $ codiff --functions /tmp/perf.before ~/bin/perf
        builtin-sched.c:
          cmd_sched                 |  -48
         1 function changed, 48 bytes removed, diff: -48
      
        builtin-report.c:
          cmd_report                |  -32
         1 function changed, 32 bytes removed, diff: -32
      
        builtin-kmem.c:
          cmd_kmem                  |  -64
          build_alloc_func_list     |  -50
         2 functions changed, 114 bytes removed, diff: -114
      
        builtin-c2c.c:
          perf_c2c__report          | -390
         1 function changed, 390 bytes removed, diff: -390
      
        ui/browsers/header.c:
          tui__header_window        | -104
         1 function changed, 104 bytes removed, diff: -104
      
        /home/acme/bin/perf:
         9 functions changed, 688 bytes removed, diff: -688
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20181102230624.20064-1-linux@rasmusvillemoes.dkSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      49b8e2be
  23. 18 12月, 2018 1 次提交
    • J
      perf report: Display average IPC and IPC coverage per symbol · ec6ae74f
      Jin Yao 提交于
      Support displaying the average IPC and IPC coverage per symbol in 'perf
      report' --tui and --stdio modes.
      
      For example,
      
       $ perf record -b ...
       $ perf report -s symbol
      
       Overhead  Symbol                           IPC   [IPC Coverage]
         39.60%  [.] __random                     2.30  [ 54.8%]
         18.02%  [.] main                         0.43  [ 54.3%]
         14.21%  [.] compute_flag                 2.29  [100.0%]
         14.16%  [.] rand                         0.36  [100.0%]
          7.06%  [.] __random_r                   2.57  [ 70.5%]
          6.85%  [.] rand@plt                     0.00  [  0.0%]
      
      Jiri Olsa <jolsa@redhat.com> provided the patch to support the --stdio
      mode. I merged Jiri's code in this patch.
      
        $ perf report -s symbol --stdio
      
          # Overhead  Symbol                       IPC   [IPC Coverage]
          # ........  ...........................  ....................
          #
            39.60%  [.] __random                   2.30  [ 54.8%]
            18.02%  [.] main                       0.43  [ 54.3%]
            14.21%  [.] compute_flag               2.29  [100.0%]
            14.16%  [.] rand                       0.36  [100.0%]
             7.06%  [.] __random_r                 2.57  [ 70.5%]
             6.85%  [.] rand@plt                   0.00  [  0.0%]
             0.02%  [k] run_timer_softirq          1.60  [ 57.2%]
      
      The columns "IPC" and "[IPC Coverage]" are automatically enabled when
      the sort-key "symbol" is specified. If the perf.data file doesn't
      contain timed LBR information, columns are filled with "-".
      
      For example,
      
        # Overhead  Symbol                       IPC   [IPC Coverage]
        # ........  ...........................  ....................
        #
            46.57%  [.] main                     -      -
            17.60%  [.] rand                     -      -
            15.84%  [.] __random_r               -      -
            11.90%  [.] __random                 -      -
             6.50%  [.] compute_flag             -      -
             1.59%  [.] rand@plt                 -      -
             0.00%  [.] _dl_relocate_object      -      -
             0.00%  [k] tlb_flush_mmu            -      -
             0.00%  [k] perf_event_mmap          -      -
             0.00%  [k] native_sched_clock       -      -
             0.00%  [k] intel_pmu_handle_irq_v4  -      -
             0.00%  [k] native_write_msr         -      -
      
       v3:
       ---
       Removed the sortkey 'ipc' from command-line. The columns "IPC"
       and "[IPC Coverage]" are automatically enabled when "symbol"
       is specified.
      
       v2:
       ---
       Merge in Jiri's patch to support stdio mode
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1543586097-27632-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec6ae74f
  24. 16 10月, 2018 1 次提交
    • J
      perf evsel: Store ids for events with their own cpus perf_event__synthesize_event_update_cpus · 4ab8455f
      Jiri Olsa 提交于
      John reported crash when recording on an event under PMU with cpumask defined:
      
        root@localhost:~# ./perf_debug_ record -e armv8_pmuv3_0/br_mis_pred/ sleep 1
        perf: Segmentation fault
        Obtained 9 stack frames.
        ./perf_debug_() [0x4c5ef8]
        [0xffff82ba267c]
        ./perf_debug_() [0x4bc5a8]
        ./perf_debug_() [0x419550]
        ./perf_debug_() [0x41a928]
        ./perf_debug_() [0x472f58]
        ./perf_debug_() [0x473210]
        ./perf_debug_() [0x4070f4]
        /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8294c8a0]
        Segmentation fault (core dumped)
      
      We synthesize an update event that needs to touch the evsel id array, which is
      not defined at that time. Fixing this by forcing the id allocation for events
      with their own cpus.
      Reported-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@huawei.com
      Fixes: bfd8f72c ("perf record: Synthesize unit/scale/... in event update")
      Link: http://lkml.kernel.org/r/20181003212052.GA32371@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4ab8455f
  25. 20 9月, 2018 1 次提交
  26. 19 9月, 2018 1 次提交
  27. 14 8月, 2018 1 次提交