1. 27 2月, 2020 1 次提交
    • R
      perf annotate: Make perf config effective · 7384083b
      Ravi Bangoria 提交于
      perf default config set by user in [annotate] section is totally ignored
      by annotate code. Fix it.
      
      Before:
      
        $ ./perf config
        annotate.hide_src_code=true
        annotate.show_nr_jumps=true
        annotate.show_nr_samples=true
      
        $ ./perf annotate shash
               │    unsigned h = 0;
               │      movl   $0x0,-0xc(%rbp)
               │    while (*s)
               │    ↓ jmp    44
               │    h = 65599 * h + *s++;
         11.33 │24:   mov    -0xc(%rbp),%eax
         43.50 │      imul   $0x1003f,%eax,%ecx
               │      mov    -0x18(%rbp),%rax
      
      After:
      
               │        movl   $0x0,-0xc(%rbp)
               │      ↓ jmp    44
             1 │1 24:   mov    -0xc(%rbp),%eax
             4 │        imul   $0x1003f,%eax,%ecx
               │        mov    -0x18(%rbp),%rax
      
      Note that we have removed show_nr_samples and show_total_period from
      annotation_options because they are not used. Instead of them we use
      symbol_conf.show_nr_samples and symbol_conf.show_total_period.
      
      Committer testing:
      
      Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:
      
        # perf config
        #
        # perf config annotate.hide_src_code=true
        # perf config
        annotate.hide_src_code=true
        #
        # perf config annotate.show_nr_jumps=true
        # perf config annotate.show_nr_samples=true
        # perf config
        annotate.hide_src_code=true
        annotate.show_nr_jumps=true
        annotate.show_nr_samples=true
        #
        #
      
      Before:
      
        # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
        Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
        ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
        Percent
                    00000000000609f0 <ObjectInstance::weak_pointer_was_finalized()@@base>:
                      endbr64
                      cmpq    $0x0,0x20(%rdi)
                    ↓ je      10
                      xor     %eax,%eax
                    ← retq
                      xchg    %ax,%ax
        100.00  10:   push    %rbp
                      cmpq    $0x0,0x18(%rdi)
                      mov     %rdi,%rbp
                    ↓ jne     20
                1b:   xor     %eax,%eax
                      pop     %rbp
                    ← retq
                      nop
                20:   lea     0x18(%rdi),%rdi
                    → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                      cmpq    $0x0,0x18(%rbp)
                    ↑ jne     1b
                      mov     %rbp,%rdi
                    → callq   ObjectBase::jsobj_addr() const@plt
                      mov     $0x1,%eax
                      pop     %rbp
                    ← retq
        #
      
      After:
      
        # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
        Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
        ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
        Samples       endbr64
                      cmpq    $0x0,0x20(%rdi)
                    ↓ je      10
                      xor     %eax,%eax
                    ← retq
                      xchg    %ax,%ax
           1  1 10:   push    %rbp
                      cmpq    $0x0,0x18(%rdi)
                      mov     %rdi,%rbp
                    ↓ jne     20
              1 1b:   xor     %eax,%eax
                      pop     %rbp
                    ← retq
                      nop
              1 20:   lea     0x18(%rdi),%rdi
                    → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                      cmpq    $0x0,0x18(%rbp)
                    ↑ jne     1b
                      mov     %rbp,%rdi
                    → callq   ObjectBase::jsobj_addr() const@plt
                      mov     $0x1,%eax
                      pop     %rbp
                    ← retq
        #
        # perf config annotate.show_nr_jumps
        annotate.show_nr_jumps=true
        # perf config annotate.show_nr_jumps=false
        # perf config annotate.show_nr_jumps
        annotate.show_nr_jumps=false
        #
        # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
        Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
        ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
        Samples       endbr64
                      cmpq    $0x0,0x20(%rdi)
                    ↓ je      10
                      xor     %eax,%eax
                    ← retq
                      xchg    %ax,%ax
             1  10:   push    %rbp
                      cmpq    $0x0,0x18(%rdi)
                      mov     %rdi,%rbp
                    ↓ jne     20
                1b:   xor     %eax,%eax
                      pop     %rbp
                    ← retq
                      nop
                20:   lea     0x18(%rdi),%rdi
                    → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                      cmpq    $0x0,0x18(%rbp)
                    ↑ jne     1b
                      mov     %rbp,%rdi
                    → callq   ObjectBase::jsobj_addr() const@plt
                      mov     $0x1,%eax
                      pop     %rbp
                    ← retq
        #
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Yisheng Xie <xieyisheng1@huawei.com>
      Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7384083b
  2. 14 1月, 2020 3 次提交
  3. 21 12月, 2019 1 次提交
  4. 04 12月, 2019 1 次提交
  5. 26 11月, 2019 2 次提交
  6. 20 11月, 2019 3 次提交
    • J
      perf report: Jump to symbol source view from total cycles view · 848a5e50
      Jin Yao 提交于
      This patch supports jumping from tui total cycles view to symbol source
      view.
      
      For example,
      
        perf record -b ./div
        perf report --total-cycles
      
      In total cycles view, we can select one entry and press 'a' or press
      ENTER key to jump to symbol source view.
      
      This patch also sets sort_order to NULL in cmd_report() which will use
      the default branch sort order. The percent value in new annotate view
      will be consistent with the percent in annotate view switched from perf
      report (we observed the original percent gap with previous patches).
      
       v2:
       ---
       Fix the 'make NO_SLANG=1' error. (set __maybe_unused to
       annotation_opts in block_hists_tui_browse()).
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      848a5e50
    • A
      perf dso: Move dso_id from 'struct map' to 'struct dso' · 0e3149f8
      Arnaldo Carvalho de Melo 提交于
      And take it into account when looking up DSOs when we have the dso_id
      fields obtained from somewhere, like from PERF_RECORD_MMAP2 records.
      
      Instances of struct map pointing to the same DSO pathname but with
      anything in dso_id different are in fact different DSOs, so better have
      different 'struct dso' instances to reflect that. At some point we may
      want to get copies of the contents of the different objects if we want
      to do correct annotation or other analysis.
      
      With this we get 'struct map' 24 bytes leaner:
      
        $ pahole -C map ~/bin/perf
        struct map {
        	union {
        		struct rb_node     rb_node __attribute__((__aligned__(8))); /*     0    24 */
        		struct list_head   node;                 /*     0    16 */
        	} __attribute__((__aligned__(8)));               /*     0    24 */
        	u64                        start;                /*    24     8 */
        	u64                        end;                  /*    32     8 */
        	_Bool                      erange_warned:1;      /*    40: 0  1 */
        	_Bool                      priv:1;               /*    40: 1  1 */
      
        	/* XXX 6 bits hole, try to pack */
        	/* XXX 3 bytes hole, try to pack */
      
        	u32                        prot;                 /*    44     4 */
        	u64                        pgoff;                /*    48     8 */
        	u64                        reloc;                /*    56     8 */
        	/* --- cacheline 1 boundary (64 bytes) --- */
        	u64                        (*map_ip)(struct map *, u64); /*    64     8 */
        	u64                        (*unmap_ip)(struct map *, u64); /*    72     8 */
        	struct dso *               dso;                  /*    80     8 */
        	refcount_t                 refcnt;               /*    88     4 */
        	u32                        flags;                /*    92     4 */
      
        	/* size: 96, cachelines: 2, members: 13 */
        	/* sum members: 92, holes: 1, sum holes: 3 */
        	/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
        	/* forced alignments: 1 */
        	/* last cacheline: 32 bytes */
        } __attribute__((__aligned__(8)));
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0e3149f8
    • A
      perf map: Move maj/min/ino/ino_generation to separate struct · 99459a84
      Arnaldo Carvalho de Melo 提交于
      And this patch highlights where these fields are being used: in the sort
      order where it uses it to compare maps and classify samples taking into
      account not just the DSO, but those DSO id fields.
      
      I think these should be used to differentiate DSOs with the same name
      but different 'struct dso_id' fields, i.e. these fields should move to
      'struct dso' and then be used as part of the key when doing lookups for
      DSOs, in addition to the DSO name.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-8v5isitqy0dup47nnwkpc80f@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99459a84
  7. 12 11月, 2019 1 次提交
  8. 07 11月, 2019 5 次提交
    • J
      perf report: Sort by sampled cycles percent per block for tui · 7fa46cbf
      Jin Yao 提交于
      Previous patch has implemented a new option "--total-cycles".  But only
      stdio mode is supported.
      
      This patch supports the tui mode and support '--percent-limit'.
      
      For example,
      
       perf record -b ./div
       perf report --total-cycles --percent-limit 1
      
       # Samples: 2753248 of event 'cycles'
       Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
                26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                 5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                 4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                 4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                 3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                 3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                 3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                 2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                 2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                 2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                 2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                 1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                 1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                 1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      --------------------------------------------------
      
       v7:
       ---
       1. Since we have used use_browser in report__browse_block_hists
          to support stdio mode, now we also add supporting for tui.
      
       2. Move block tui browser code from ui/browsers/hists.c
          to block-info.c.
      
       v6:
       ---
       Create report__tui_browse_block_hists in block-info.c
       (codes are moved from builtin-report.c).
      
       v5:
       ---
       Fix a crash issue when running perf report without
       '--total-cycles'. The issue is because the internal flag
       is renamed from 'total_cycles' to 'total_cycles_mode' in
       previous patch but this patch still uses 'total_cycles'
       to check if the '--total-cycles' option is enabled, which
       causes the code to be inconsistent.
      
       v4:
       ---
       Since the block collection is moved out of printing in
       previous patch, this patch is updated accordingly for
       tui supporting.
      
       v3:
       ---
       Minor change since the function name is changed:
       block_total_cycles_percent -> block_info__total_cycles_percent
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fa46cbf
    • J
      perf report: Support --percent-limit for --total-cycles · 0b49f836
      Jin Yao 提交于
      We have already supported the '--total-cycles' option in previous patch.
      It's also useful to show entries only above a threshold percent.
      
      This patch enables '--percent-limit' for not showing entries
      under that percent.
      
      For example:
      
       perf report --total-cycles --stdio --percent-limit 1
      
       # To display the perf.data header info, please use --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 2M of event 'cycles'
       # Event count (approx.): 2753248
       #
       # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
       # ...............  ..............  ...........  ..........  .................................................................  ....................
       #
                  26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                  15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                   5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                   4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                   4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                   3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                   3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                   3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                   2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                   2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                   2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                   2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                   1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                   1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                   1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      Committer testing:
      
      From second exapmple onwards slightly edited for brevity:
      
        # perf report --total-cycles --percent-limit 2 --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
        #
        # (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
        #
        # perf report --total-cycles --percent-limit 1 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
        #
        # perf report --total-cycles --percent-limit 0.7 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
        #
      
      -------------------------------------------
      
      It only shows the entries which 'Sampled Cycles%' > 1%.
      
       v7:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v6:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v5:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v4:
       ---
       No functional change. Only fix the build issue because
       previous patches are changed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0b49f836
    • J
      perf report: Sort by sampled cycles percent per block for stdio · 6f7164fa
      Jin Yao 提交于
      It would be useful to support sorting for all blocks by the sampled
      cycles percent per block. This is useful to concentrate on the globally
      hottest blocks.
      
      This patch implements a new option "--total-cycles" which sorts all
      blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
      
       percent = block sampled cycles aggregation / total sampled cycles
      
      Note that, this patch only supports "--stdio" mode.
      
      For example,
      
        # perf record -b ./div
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 2M of event 'cycles'
        # Event count (approx.): 2753248
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                             [Program Block Range]      Shared Object
        # ...............  ..............  ...........  ..........  ................................................  .................
        #
                   26.04%            2.8M        0.40%          18                            [div.c:42 -> div.c:39]                div
                   15.17%            1.2M        0.16%           7                [random_r.c:357 -> random_r.c:380]       libc-2.27.so
                    5.11%          402.0K        0.04%           2                            [div.c:27 -> div.c:28]                div
                    4.87%          381.6K        0.04%           2                    [random.c:288 -> random.c:291]       libc-2.27.so
                    4.53%          381.0K        0.04%           2                            [div.c:40 -> div.c:40]                div
                    3.85%          300.9K        0.02%           1                            [div.c:22 -> div.c:25]                div
                    3.08%          241.1K        0.02%           1                          [rand.c:26 -> rand.c:27]       libc-2.27.so
                    3.06%          240.0K        0.02%           1                    [random.c:291 -> random.c:291]       libc-2.27.so
                    2.78%          215.7K        0.02%           1                    [random.c:298 -> random.c:298]       libc-2.27.so
                    2.52%          198.3K        0.02%           1                    [random.c:293 -> random.c:293]       libc-2.27.so
                    2.36%          184.8K        0.02%           1                          [rand.c:28 -> rand.c:28]       libc-2.27.so
                    2.33%          180.5K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.28%          176.7K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.20%          168.8K        0.02%           1                        [rand@plt+0 -> rand@plt+0]                div
                    1.98%          158.2K        0.02%           1                [random_r.c:388 -> random_r.c:388]       libc-2.27.so
                    1.57%          123.3K        0.02%           1                            [div.c:42 -> div.c:44]                div
                    1.44%          116.0K        0.42%          19                [random_r.c:357 -> random_r.c:394]       libc-2.27.so
                    0.25%          182.5K        0.02%           1                [random_r.c:388 -> random_r.c:391]       libc-2.27.so
                    0.00%              48        1.07%          48        [x86_pmu_enable+284 -> x86_pmu_enable+298]  [kernel.kallsyms]
                    0.00%              74        1.64%          74             [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92]  [kernel.kallsyms]
                    0.00%              73        1.62%          73                         [vm_mmap+0 -> vm_mmap+48]  [kernel.kallsyms]
                    0.00%              63        0.69%          31                       [up_write+0 -> up_write+34]  [kernel.kallsyms]
                    0.00%              13        0.29%          13      [setup_arg_pages+396 -> setup_arg_pages+413]  [kernel.kallsyms]
                    0.00%               3        0.07%           3      [setup_arg_pages+418 -> setup_arg_pages+450]  [kernel.kallsyms]
                    0.00%             616        6.84%         308   [security_mmap_file+0 -> security_mmap_file+72]  [kernel.kallsyms]
                    0.00%              23        0.51%          23  [security_mmap_file+77 -> security_mmap_file+87]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                  [sched_clock+0 -> sched_clock+4]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                 [sched_clock+9 -> sched_clock+12]  [kernel.kallsyms]
                    0.00%               1        0.02%           1                [rcu_nmi_exit+0 -> rcu_nmi_exit+9]  [kernel.kallsyms]
      
      Committer testing:
      
      This should provide material for hours of endless joy, both from looking
      for suspicious things in the implementation of this patch, such as the
      top one:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    2.17%            1.7M        0.08%         607   [compiler.h:199 -> common.c:221]              [kernel.vmlinux]
      
      As well from things that look legit:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    0.16%          123.0K        0.60%        4.7K   [nospec-branch.h:265 -> nospec-branch.h:278]  [kernel.vmlinux]
      
      :-)
      
      Very short system wide taken branches session:
      
        # perf record -h -b
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -b, --branch-any      sample any taken branches
      
        #
        # perf record -b
        ^C[ perf record: Woken up 595 times to write data ]
        [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
      
        #
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        #
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
                    0.56%          541.8K        0.09%         672                                        [compiler.h:199 -> common.c:300]      [kernel.vmlinux]
                    0.39%          293.2K        0.01%         104                                    [list_debug.c:43 -> list_debug.c:61]      [kernel.vmlinux]
                    0.36%          278.6K        0.03%         272                                    [entry_64.S:1289 -> entry_64.S:1308]      [kernel.vmlinux]
                    0.30%          260.8K        0.07%         564                              [clear_page_64.S:47 -> clear_page_64.S:50]      [kernel.vmlinux]
                    0.28%          215.3K        0.05%         369                                            [traps.c:623 -> traps.c:628]      [kernel.vmlinux]
                    0.23%          178.1K        0.04%         278                                      [entry_64.S:271 -> entry_64.S:275]      [kernel.vmlinux]
                    0.20%          152.6K        0.09%         706                                      [paravirt.c:177 -> paravirt.c:179]      [kernel.vmlinux]
                    0.20%          155.8K        0.05%         373                                      [entry_64.S:153 -> entry_64.S:175]      [kernel.vmlinux]
                    0.18%          136.6K        0.03%         222                                                [msr.h:105 -> msr.h:166]      [kernel.vmlinux]
                    0.16%          123.0K        0.60%        4.7K                            [nospec-branch.h:265 -> nospec-branch.h:278]      [kernel.vmlinux]
                    0.16%          118.3K        0.01%          44                                      [entry_64.S:632 -> entry_64.S:657]      [kernel.vmlinux]
                    0.14%          104.5K        0.00%          28                                          [rwsem.c:1541 -> rwsem.c:1544]      [kernel.vmlinux]
                    0.13%           99.2K        0.01%          53                                      [spinlock.c:150 -> spinlock.c:152]      [kernel.vmlinux]
                    0.13%           95.5K        0.00%          35                                              [swap.c:456 -> swap.c:471]      [kernel.vmlinux]
                    0.12%           96.2K        0.05%         407                              [copy_user_64.S:175 -> copy_user_64.S:209]      [kernel.vmlinux]
                    0.11%           85.9K        0.00%          31                                        [swap.c:400 -> page-flags.h:188]      [kernel.vmlinux]
                    0.10%           73.0K        0.01%          52                                          [paravirt.h:763 -> list.h:131]      [kernel.vmlinux]
                    0.07%           56.2K        0.03%         214                                      [filemap.c:1524 -> filemap.c:1557]      [kernel.vmlinux]
                    0.07%           54.2K        0.02%         145                                        [memory.c:1032 -> memory.c:1049]      [kernel.vmlinux]
                    0.07%           50.3K        0.00%          39                                            [mmzone.c:49 -> mmzone.c:69]      [kernel.vmlinux]
                    0.06%           48.3K        0.01%          40                                   [paravirt.h:768 -> page_alloc.c:3304]      [kernel.vmlinux]
                    0.06%           46.7K        0.02%         155                                        [memory.c:1032 -> memory.c:1056]      [kernel.vmlinux]
                    0.06%           46.9K        0.01%         103                                              [swap.c:867 -> swap.c:902]      [kernel.vmlinux]
                    0.06%           47.8K        0.00%          34                                    [entry_64.S:1201 -> entry_64.S:1202]      [kernel.vmlinux]
      
       -----------------------------------------------------------
      
       v7:
       ---
       Use use_browser in report__browse_block_hists for supporting
       stdio and potential tui mode.
      
       v6:
       ---
       Create report__browse_block_hists in block-info.c (codes are
       moved from builtin-report.c). It's called from
       perf_evlist__tty_browse_hists.
      
       v5:
       ---
       1. Move all block functions to block-info.c
      
       2. Move the code of setting ms in block hist_entry to
          other patch.
      
       v4:
       ---
       1. Use new option '--total-cycles' to replace
          '-s total_cycles' in v3.
      
       2. Move block info collection out of block info
          printing.
      
       v3:
       ---
       1. Use common function block_info__process_sym to
          process the blocks per symbol.
      
       2. Remove the nasty hack for skipping calculation
          of column length
      
       3. Some minor cleanup
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f7164fa
    • J
      perf hist: Count the total cycles of all samples · 7841f40a
      Jin Yao 提交于
      We can get the per sample cycles by hist__account_cycles(). It's also
      useful to know the total cycles of all samples in order to get the
      cycles coverage for a single program block in further. For example:
      
        coverage = per block sampled cycles / total sampled cycles
      
      This patch creates a new argument 'total_cycles' in hist__account_cycles(),
      which will be added with the cycles of each sample.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7841f40a
    • A
      perf maps: Add for_each_entry()/_safe() iterators · 8efc4f05
      Arnaldo Carvalho de Melo 提交于
      To reduce boilerplate, provide a more compact form using an idiom
      present in other trees of data structures.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8efc4f05
  9. 15 10月, 2019 1 次提交
  10. 21 9月, 2019 1 次提交
  11. 20 9月, 2019 1 次提交
  12. 01 9月, 2019 4 次提交
  13. 29 8月, 2019 2 次提交
  14. 26 8月, 2019 1 次提交
  15. 20 8月, 2019 1 次提交
    • A
      perf report: Prefer DWARF callstacks to LBR ones when captured both · 10ccbc1c
      Alexey Budankov 提交于
      Display DWARF based callchains when the perf.data file contains raw thread
      stack data as LBR callstack data.
      
      Commiter testing:
      
      This changes the output from the branch stack based one, i.e. without
      this patch, for the same file as in the previous csets:
      
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 13
        #
        # Overhead  Command  Source Shared Object  Source Symbol                Target Symbol                              Basic Block Cycles
        # ........  .......  ....................  ...........................  .........................................  ..................
        #
             7.69%  ls       libpthread-2.29.so    [.] _init                    [.] __pthread_initialize_minimal_internal  6827
             7.69%  ls       ld-2.29.so            [k] _start                   [k] _dl_start                              -
             7.69%  ls       ld-2.29.so            [.] _dl_start_user           [.] _dl_init                               -24790
             7.69%  ls       ld-2.29.so            [k] _dl_start                [k] _dl_sysdep_start                       278
             7.69%  ls       ld-2.29.so            [k] dl_main                  [k] _dl_map_object_deps                    15581
             7.69%  ls       ld-2.29.so            [k] open_verify.constprop.0  [k] lseek64                                4228
             7.69%  ls       ld-2.29.so            [k] _dl_map_object           [k] open_verify.constprop.0                55
             7.69%  ls       ld-2.29.so            [k] openaux                  [k] _dl_map_object                         67
             7.69%  ls       ld-2.29.so            [k] _dl_map_object_deps      [k] 0x00007f441b57c090                     112
             7.69%  ls       ld-2.29.so            [.] call_init.part.0         [.] _init                                  334
             7.69%  ls       ld-2.29.so            [.] _dl_init                 [.] call_init.part.0                       383
             7.69%  ls       ld-2.29.so            [k] _dl_sysdep_start         [k] dl_main                                45
             7.69%  ls       ld-2.29.so            [k] _dl_catch_exception      [k] openaux                                116
      
        #
        # (Tip: For memory address profiling, try: perf mem record / perf mem report)
        #
      
      To the one that shows call chains:
      
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 10  of event 'cycles'
        # Event count (approx.): 3204047
        #
        # Children      Self  Command  Shared Object       Symbol
        # ........  ........  .......  ..................  .........................................
        #
            55.01%     0.00%  ls       [kernel.vmlinux]    [k] entry_SYSCALL_64_after_hwframe
                    |
                    ---entry_SYSCALL_64_after_hwframe
                       do_syscall_64
                       |
                        --16.01%--__x64_sys_execve
                                  __do_execve_file.isra.0
                                  search_binary_handler
                                  load_elf_binary
                                  elf_map
                                  vm_mmap_pgoff
                                  do_mmap
                                  mmap_region
                                  perf_event_mmap
                                  perf_iterate_sb
                                  perf_iterate_ctx
                                  perf_event_mmap_output
                                  perf_output_copy
                                  memcpy_erms
      
            55.01%    39.00%  ls       [kernel.vmlinux]    [k] do_syscall_64
                    |
                    |--39.00%--0xffffffffffffffff
                    |          _dl_map_object
                    |          open_verify.constprop.0
                    |          __lseek64 (inlined)
                    |          entry_SYSCALL_64_after_hwframe
                    |          do_syscall_64
                    |
                     --16.01%--do_syscall_64
                               __x64_sys_execve
                               __do_execve_file.isra.0
                               search_binary_handler
                               load_elf_binary
                               elf_map
                               vm_mmap_pgoff
                               do_mmap
                               mmap_region
                               perf_event_mmap
                               perf_iterate_sb
                               perf_iterate_ctx
                               perf_event_mmap_output
                               perf_output_copy
                               memcpy_erms
      
            42.95%    42.95%  ls       libpthread-2.29.so  [.] __pthread_initialize_minimal_internal
                    |
                    ---_init
                       __pthread_initialize_minimal_internal
      
            42.95%     0.00%  ls       libpthread-2.29.so  [.] _init
                    |
                    ---_init
                       __pthread_initialize_minimal_internal
      
        <SNIP>
      
        #
        # (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
        #
        #
      
      The branch stack view be explicitely selected using:
      
        # perf report -h branch-stack
      
         Usage: perf report [<options>]
      
            -b, --branch-stack    use branch records for per branch histogram filling
      
        #
      
      I.e. after this patch:
      
        # perf report -b --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 13
        #
        # Overhead  Command  Source Shared Object  Source Symbol                Target Symbol                              Basic Block Cycles
        # ........  .......  ....................  ...........................  .........................................  ..................
        #
             7.69%  ls       libpthread-2.29.so    [.] _init                    [.] __pthread_initialize_minimal_internal  6827
             7.69%  ls       ld-2.29.so            [k] _start                   [k] _dl_start                              -
             7.69%  ls       ld-2.29.so            [.] _dl_start_user           [.] _dl_init                               -24790
             7.69%  ls       ld-2.29.so            [k] _dl_start                [k] _dl_sysdep_start                       278
             7.69%  ls       ld-2.29.so            [k] dl_main                  [k] _dl_map_object_deps                    15581
             7.69%  ls       ld-2.29.so            [k] open_verify.constprop.0  [k] lseek64                                4228
             7.69%  ls       ld-2.29.so            [k] _dl_map_object           [k] open_verify.constprop.0                55
             7.69%  ls       ld-2.29.so            [k] openaux                  [k] _dl_map_object                         67
             7.69%  ls       ld-2.29.so            [k] _dl_map_object_deps      [k] 0x00007f441b57c090                     112
             7.69%  ls       ld-2.29.so            [.] call_init.part.0         [.] _init                                  334
             7.69%  ls       ld-2.29.so            [.] _dl_init                 [.] call_init.part.0                       383
             7.69%  ls       ld-2.29.so            [k] _dl_sysdep_start         [k] dl_main                                45
             7.69%  ls       ld-2.29.so            [k] _dl_catch_exception      [k] openaux                                116
      
        #
        # (Tip: Show current config key-value pairs: perf config --list)
        #
        #
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/ccbd9583-82f4-dec5-7e84-64bf56e351fb@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      10ccbc1c
  16. 16 8月, 2019 1 次提交
    • A
      perf report: Add --switch-on/--switch-off events · ef4b1a53
      Arnaldo Carvalho de Melo 提交于
      Since 'perf top' shares the histogram browser with 'perf report', then
      the same explanation in the previous cset applies.
      
      An additional example uses a pair of SDT events available for systemtap:
      
        # perf probe --exec=/usr/bin/stap '%*:*'
        Added new events:
          sdt_stap:benchmark__thread__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark   (on %* in /usr/bin/stap)
          sdt_stap:benchmark__thread__end (on %* in /usr/bin/stap)
          sdt_stap:pass6__start (on %* in /usr/bin/stap)
          sdt_stap:pass6__end  (on %* in /usr/bin/stap)
          sdt_stap:pass5__start (on %* in /usr/bin/stap)
          sdt_stap:pass5__end  (on %* in /usr/bin/stap)
          sdt_stap:pass0__start (on %* in /usr/bin/stap)
          sdt_stap:pass0__end  (on %* in /usr/bin/stap)
          sdt_stap:pass1a__start (on %* in /usr/bin/stap)
          sdt_stap:pass1b__start (on %* in /usr/bin/stap)
          sdt_stap:pass1__end  (on %* in /usr/bin/stap)
          sdt_stap:pass2__start (on %* in /usr/bin/stap)
          sdt_stap:pass2__end  (on %* in /usr/bin/stap)
          sdt_stap:pass3__start (on %* in /usr/bin/stap)
          sdt_stap:pass3__end  (on %* in /usr/bin/stap)
          sdt_stap:pass4__start (on %* in /usr/bin/stap)
          sdt_stap:pass4__end  (on %* in /usr/bin/stap)
          sdt_stap:benchmark__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark__end (on %* in /usr/bin/stap)
          sdt_stap:cache__get  (on %* in /usr/bin/stap)
          sdt_stap:cache__clean (on %* in /usr/bin/stap)
          sdt_stap:cache__add__module (on %* in /usr/bin/stap)
          sdt_stap:cache__add__source (on %* in /usr/bin/stap)
          sdt_stap:stap_system__complete (on %* in /usr/bin/stap)
          sdt_stap:stap_system__start (on %* in /usr/bin/stap)
          sdt_stap:stap_system__spawn (on %* in /usr/bin/stap)
          sdt_stap:stap_system__fork (on %* in /usr/bin/stap)
          sdt_stap:intern_string (on %* in /usr/bin/stap)
          sdt_stap:client__start (on %* in /usr/bin/stap)
          sdt_stap:client__end (on %* in /usr/bin/stap)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e sdt_stap:client__end -aR sleep 1
      
        #
      
      From these we're use the two below to run systemtap's test suite:
      
        # perf record -e sdt_stap:pass2__*,cycles:P make installcheck > /dev/null
        ^C[ perf record: Woken up 8 times to write data ]
        [ perf record: Captured and wrote 2.691 MB perf.data (39638 samples) ]
        Terminated
        # perf script | grep sdt_stap
                    stap 28979 [000] 19424.302660: sdt_stap:pass2__start: (561b9a537de3) arg1=140730364262544
                    stap 28979 [000] 19424.333083:   sdt_stap:pass2__end: (561b9a53a9e1) arg1=140730364262544
                    stap 29045 [006] 19424.933460: sdt_stap:pass2__start: (563edddcede3) arg1=140722674883152
                    stap 29045 [006] 19424.963794:   sdt_stap:pass2__end: (563edddd19e1) arg1=140722674883152
        # perf script | grep cycles |  wc -l
        39634
        #
      
      Looking at the whole perf.data file:
      
        [root@quaco testsuite]# perf report | grep cycles:P -A25
        # Samples: 39K of event 'cycles:P'
        # Event count (approx.): 34044267368
        #
        # Overhead  Command  Shared Object         Symbol
        # ........  .......  ....................  ................................
        #
             3.50%  cc1      cc1                   [.] ht_lookup_with_hash
             3.04%  cc1      cc1                   [.] _cpp_lex_token
             2.11%  cc1      cc1                   [.] ggc_internal_alloc
             1.83%  cc1      cc1                   [.] cpp_get_token_with_location
             1.68%  cc1      libc-2.29.so          [.] _int_malloc
             1.41%  cc1      cc1                   [.] linemap_position_for_column
             1.25%  cc1      cc1                   [.] ggc_internal_cleared_alloc
             1.20%  cc1      cc1                   [.] c_lex_with_flags
             1.18%  cc1      cc1                   [.] get_combined_adhoc_loc
             1.05%  cc1      libc-2.29.so          [.] malloc
             1.01%  cc1      libc-2.29.so          [.] _int_free
             0.96%  stap     stap                  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, stringtable_hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > > >
             0.78%  stap     stap                  [.] lexer::scan
             0.74%  cc1      cc1                   [.] _cpp_lex_direct
             0.70%  cc1      cc1                   [.] pop_scope
             0.70%  cc1      cc1                   [.] c_parser_declspecs
             0.69%  stap     libc-2.29.so          [.] _int_malloc
             0.68%  cc1      cc1                   [.] htab_find_slot
             0.68%  cc1      [kernel.vmlinux]      [k] prepare_exit_to_usermode
             0.64%  cc1      [kernel.vmlinux]      [k] clear_page_erms
        [root@quaco testsuite]#
      
      And now only what happens in slices demarcated by those start/end SDT
      events:
      
        [root@quaco testsuite]# perf report --switch-on=sdt_stap:pass2__start --switch-off=sdt_stap:pass2__end | grep cycles:P -A100
        # Samples: 240  of event 'cycles:P'
        # Event count (approx.): 206491934
        #
        # Overhead  Command  Shared Object        Symbol
        # ........  .......  ...................  ................................................
        #
            38.99%  stap     stap                 [.] systemtap_session::register_library_aliases
            19.47%  stap     stap                 [.] match_key::operator<
            15.01%  stap     libc-2.29.so         [.] __memcmp_avx2_movbe
             5.19%  stap     libc-2.29.so         [.] _int_malloc
             2.50%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_insert_and_rebalance
             2.30%  stap     stap                 [.] match_node::build_no_more
             2.07%  stap     libc-2.29.so         [.] malloc
             1.66%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::find
             1.66%  stap     stap                 [.] match_node::bind
             1.58%  stap     [kernel.vmlinux]     [k] prepare_exit_to_usermode
             1.17%  stap     [kernel.vmlinux]     [k] native_irq_return_iret
             0.87%  stap     stap                 [.] 0x0000000000032ec4
             0.77%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_increment
             0.47%  stap     stap                 [.] std::vector<derived_probe_builder*, std::allocator<derived_probe_builder*> >::_M_realloc_insert<derived_probe_builder* const&>
             0.47%  stap     [kernel.vmlinux]     [k] get_page_from_freelist
             0.47%  stap     [kernel.vmlinux]     [k] swapgs_restore_regs_and_return_to_usermode
             0.47%  stap     [kernel.vmlinux]     [k] do_user_addr_fault
             0.46%  stap     [kernel.vmlinux]     [k] __pagevec_lru_add_fn
             0.46%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::_M_emplace_unique<std::pair<match_key, match_node*> >
             0.42%  stap     libstdc++.so.6.0.26  [.] 0x00000000000c18fa
             0.40%  stap     [kernel.vmlinux]     [k] interrupt_entry
             0.40%  stap     [kernel.vmlinux]     [k] update_load_avg
             0.40%  stap     [kernel.vmlinux]     [k] __intel_pmu_disable_all
             0.40%  stap     [kernel.vmlinux]     [k] clear_page_erms
             0.39%  stap     [kernel.vmlinux]     [k] __mod_node_page_state
             0.39%  stap     [kernel.vmlinux]     [k] error_entry
             0.39%  stap     [kernel.vmlinux]     [k] sync_regs
             0.38%  stap     [kernel.vmlinux]     [k] __handle_mm_fault
             0.38%  stap     stap                 [.] derive_probes
      
        #
        # (Tip: System-wide collection from all CPUs: perf record -a)
        #
        [root@quaco testsuite]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: William Cohen <wcohen@redhat.com>
      Link: https://lkml.kernel.org/n/tip-408hvumcnyn93a0auihnawew@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef4b1a53
  17. 30 7月, 2019 3 次提交
  18. 09 7月, 2019 2 次提交
  19. 26 6月, 2019 2 次提交
  20. 11 6月, 2019 1 次提交
  21. 16 5月, 2019 2 次提交
    • A
      perf report: Implement perf.data record decompression · cb62c6f1
      Alexey Budankov 提交于
      zstd_init(, comp_level = 0) initializes decompression part of API only
      hat now consists of zstd_decompress_stream() function.
      
      The perf.data PERF_RECORD_COMPRESSED records are decompressed using
      zstd_decompress_stream() function into a linked list of mmaped memory
      regions of mmap_comp_len size (struct decomp).
      
      After decompression of one COMPRESSED record its content is iterated and
      fetched for usual processing. The mmaped memory regions with
      decompressed events are kept in the linked list till the tool process
      termination.
      
      When dumping raw records (e.g., perf report -D --header) file offsets of
      events from compressed records are printed as zero.
      
      Committer notes:
      
      Since now we have support for processing PERF_RECORD_COMPRESSED, we see
      none, in raw form, like we saw in the previous patch commiter notes,
      they were decompressed into the usual PERF_RECORD_{FORK,MMAP,COMM,etc}
      records, we only see the stats for those PERF_RECORD_COMPRESSED events,
      and since I used the file generated in the commiter notes for the
      previous patch, there they are, 2 compressed records:
      
        $ perf report --header-only | grep cmdline
        # cmdline : /home/acme/bin/perf record -z2 sleep 1
        $ perf report -D | grep COMPRESS
              COMPRESSED events:          2
              COMPRESSED events:          0
        $ perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 15  of event 'cycles:u'
        # Event count (approx.): 962227
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ...........................
        #
            46.99%  sleep    libc-2.28.so      [.] _dl_addr
            29.24%  sleep    [unknown]         [k] 0xffffffffaea00a67
            16.45%  sleep    libc-2.28.so      [.] __GI__IO_un_link.part.1
             5.92%  sleep    ld-2.28.so        [.] _dl_setup_hash
             1.40%  sleep    libc-2.28.so      [.] __nanosleep
             0.00%  sleep    [unknown]         [k] 0xffffffffaea00163
      
        #
        # (Tip: To see callchains in a more compact form: perf report -g folded)
        #
        $
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/304b0a59-942c-3fe1-da02-aa749f87108b@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cb62c6f1
    • J
      perf annotate: Remove hist__account_cycles() from callback · bdd1666b
      Jin Yao 提交于
      The hist__account_cycles() function is executed when the
      hist_iter__branch_callback() is called.
      
      But it looks it's not necessary.  In hist__account_cycles, it already
      walks on all branch entries.
      
      This patch moves the hist__account_cycles out of callback, now the data
      processing is much faster than before.
      
      Previous code has an issue that the ch[offset].num++ (in
      __symbol__account_cycles) is executed repeatedly since
      hist__account_cycles is called in each hist_iter__branch_callback, so
      the counting of ch[offset].num is not correct (too big).
      
      With this patch, the issue is fixed. And we don't need the code of
      "ch->reset >= ch->num / 2" to check if there are too many overlaps (in
      annotation__count_and_fill), otherwise some data would be hidden.
      
      Now, we can try, for example:
      
        perf record -b ...
        perf annotate or perf report -s symbol
      
      The before/after output should be no change.
      
       v3:
       ---
       Fix the crash in stdio mode.
       Like previous code, it needs the checking of ui__has_annotation()
       before hist__account_cycles()
      
       v2:
       ---
       1. Cover the similar perf report
       2. Remove the checking code "ch->reset >= ch->num / 2"
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1552684577-29041-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bdd1666b
  22. 20 3月, 2019 1 次提交