1. 24 3月, 2020 2 次提交
    • J
      perf report: Support a new key to reload the browser · 5e3b810a
      Jin Yao 提交于
      Sometimes we may need to reload the browser to update the output since
      some options are changed.
      
      This patch creates a new key K_RELOAD. Once the __cmd_report() returns
      K_RELOAD, it would repeat the whole process, such as, read samples from
      data file, sort the data and display in the browser.
      
       v5:
       ---
       1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
       2. Skip setup_sorting() in repeat path if last key is K_RELOAD.
      
       v4:
       ---
       Need to quit in perf_evsel_menu__run if key is K_RELOAD.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5e3b810a
    • J
      perf report: Allow specifying event to be used as sort key in --group output · 429a5f9d
      Jin Yao 提交于
      When performing "perf report --group", it shows the event group
      information together. By default, the output is sorted by the first
      event in group.
      
      It would be nice for user to select any event for sorting. This patch
      introduces a new option "--group-sort-idx" to sort the output by the
      event at the index n in event group.
      
      For example,
      
      Before:
      
        # perf report --group --stdio
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             1.56%   0.01%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494ce
             1.56%   0.00%   0.00%   0.00%  mgen       [kernel.kallsyms]        [k] task_tick_fair
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             0.00%   0.03%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] g_main_context_check
             0.00%   0.03%   0.00%   0.00%  swapper    [kernel.kallsyms]        [k] apic_timer_interrupt
             ...
      
      After:
      
        # perf report --group --stdio --group-sort-idx 3
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             0.00%   0.00%   0.00%   0.06%  swapper    [kernel.kallsyms]        [k] hrtimer_start_range_ns
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] update_curr
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] apic_timer_interrupt
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] native_apic_msr_eoi_write
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] __update_load_avg_se
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] scheduler_tick
      
      Now the output is sorted by the fourth event in group.
      
       v7:
       ---
       Rebase to latest perf/core, no other change.
      
       v4:
       ---
       1. Update Documentation/perf-report.txt to mention
          '--group-sort-idx' support multiple groups with different
          amount of events and it should be used on grouped events.
      
       2. Update __hpp__group_sort_idx(), just return when the
          idx is out of limit.
      
       3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups.
          So now we don't need to use together with --group.
      
       v3:
       ---
       Refine the code in __hpp__group_sort_idx().
      
       Before:
         for (i = 1; i < nr_members; i++) {
              if (i == idx) {
                      ret = field_cmp(fields_a[i], fields_b[i]);
                      if (ret)
                              goto out;
              }
         }
      
       After:
         if (idx >= 1 && idx < nr_members) {
              ret = field_cmp(fields_a[idx], fields_b[idx]);
              if (ret)
                      goto out;
         }
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com
      [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      429a5f9d
  2. 18 3月, 2020 1 次提交
    • J
      perf report: Fix no branch type statistics report issue · c3b10649
      Jin Yao 提交于
      Previously we could get the report of branch type statistics.
      
      For example:
      
        # perf record -j any,save_type ...
        # t perf report --stdio
      
        #
        # Branch Statistics:
        #
        COND_FWD:  40.6%
        COND_BWD:   4.1%
        CROSS_4K:  24.7%
        CROSS_2M:  12.3%
            COND:  44.7%
          UNCOND:   0.0%
             IND:   6.1%
            CALL:  24.5%
             RET:  24.7%
      
      But now for the recent perf, it can't report the branch type statistics.
      
      It's a regression issue caused by commit 40c39e30 ("perf report: Fix
      a no annotate browser displayed issue"), which only counts the branch
      type statistics for browser mode.
      
      This patch moves the branch_type_count() outside of ui__has_annotation()
      checking, then branch type statistics can work for stdio mode.
      
      Fixes: 40c39e30 ("perf report: Fix a no annotate browser displayed issue")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200313134607.12873-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c3b10649
  3. 10 3月, 2020 1 次提交
  4. 27 2月, 2020 1 次提交
    • R
      perf annotate: Make perf config effective · 7384083b
      Ravi Bangoria 提交于
      perf default config set by user in [annotate] section is totally ignored
      by annotate code. Fix it.
      
      Before:
      
        $ ./perf config
        annotate.hide_src_code=true
        annotate.show_nr_jumps=true
        annotate.show_nr_samples=true
      
        $ ./perf annotate shash
               │    unsigned h = 0;
               │      movl   $0x0,-0xc(%rbp)
               │    while (*s)
               │    ↓ jmp    44
               │    h = 65599 * h + *s++;
         11.33 │24:   mov    -0xc(%rbp),%eax
         43.50 │      imul   $0x1003f,%eax,%ecx
               │      mov    -0x18(%rbp),%rax
      
      After:
      
               │        movl   $0x0,-0xc(%rbp)
               │      ↓ jmp    44
             1 │1 24:   mov    -0xc(%rbp),%eax
             4 │        imul   $0x1003f,%eax,%ecx
               │        mov    -0x18(%rbp),%rax
      
      Note that we have removed show_nr_samples and show_total_period from
      annotation_options because they are not used. Instead of them we use
      symbol_conf.show_nr_samples and symbol_conf.show_total_period.
      
      Committer testing:
      
      Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:
      
        # perf config
        #
        # perf config annotate.hide_src_code=true
        # perf config
        annotate.hide_src_code=true
        #
        # perf config annotate.show_nr_jumps=true
        # perf config annotate.show_nr_samples=true
        # perf config
        annotate.hide_src_code=true
        annotate.show_nr_jumps=true
        annotate.show_nr_samples=true
        #
        #
      
      Before:
      
        # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
        Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
        ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
        Percent
                    00000000000609f0 <ObjectInstance::weak_pointer_was_finalized()@@base>:
                      endbr64
                      cmpq    $0x0,0x20(%rdi)
                    ↓ je      10
                      xor     %eax,%eax
                    ← retq
                      xchg    %ax,%ax
        100.00  10:   push    %rbp
                      cmpq    $0x0,0x18(%rdi)
                      mov     %rdi,%rbp
                    ↓ jne     20
                1b:   xor     %eax,%eax
                      pop     %rbp
                    ← retq
                      nop
                20:   lea     0x18(%rdi),%rdi
                    → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                      cmpq    $0x0,0x18(%rbp)
                    ↑ jne     1b
                      mov     %rbp,%rdi
                    → callq   ObjectBase::jsobj_addr() const@plt
                      mov     $0x1,%eax
                      pop     %rbp
                    ← retq
        #
      
      After:
      
        # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
        Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
        ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
        Samples       endbr64
                      cmpq    $0x0,0x20(%rdi)
                    ↓ je      10
                      xor     %eax,%eax
                    ← retq
                      xchg    %ax,%ax
           1  1 10:   push    %rbp
                      cmpq    $0x0,0x18(%rdi)
                      mov     %rdi,%rbp
                    ↓ jne     20
              1 1b:   xor     %eax,%eax
                      pop     %rbp
                    ← retq
                      nop
              1 20:   lea     0x18(%rdi),%rdi
                    → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                      cmpq    $0x0,0x18(%rbp)
                    ↑ jne     1b
                      mov     %rbp,%rdi
                    → callq   ObjectBase::jsobj_addr() const@plt
                      mov     $0x1,%eax
                      pop     %rbp
                    ← retq
        #
        # perf config annotate.show_nr_jumps
        annotate.show_nr_jumps=true
        # perf config annotate.show_nr_jumps=false
        # perf config annotate.show_nr_jumps
        annotate.show_nr_jumps=false
        #
        # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
        Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
        ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
        Samples       endbr64
                      cmpq    $0x0,0x20(%rdi)
                    ↓ je      10
                      xor     %eax,%eax
                    ← retq
                      xchg    %ax,%ax
             1  10:   push    %rbp
                      cmpq    $0x0,0x18(%rdi)
                      mov     %rdi,%rbp
                    ↓ jne     20
                1b:   xor     %eax,%eax
                      pop     %rbp
                    ← retq
                      nop
                20:   lea     0x18(%rdi),%rdi
                    → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                      cmpq    $0x0,0x18(%rbp)
                    ↑ jne     1b
                      mov     %rbp,%rdi
                    → callq   ObjectBase::jsobj_addr() const@plt
                      mov     $0x1,%eax
                      pop     %rbp
                    ← retq
        #
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Yisheng Xie <xieyisheng1@huawei.com>
      Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7384083b
  5. 14 1月, 2020 3 次提交
  6. 21 12月, 2019 1 次提交
  7. 04 12月, 2019 1 次提交
  8. 26 11月, 2019 2 次提交
  9. 20 11月, 2019 3 次提交
    • J
      perf report: Jump to symbol source view from total cycles view · 848a5e50
      Jin Yao 提交于
      This patch supports jumping from tui total cycles view to symbol source
      view.
      
      For example,
      
        perf record -b ./div
        perf report --total-cycles
      
      In total cycles view, we can select one entry and press 'a' or press
      ENTER key to jump to symbol source view.
      
      This patch also sets sort_order to NULL in cmd_report() which will use
      the default branch sort order. The percent value in new annotate view
      will be consistent with the percent in annotate view switched from perf
      report (we observed the original percent gap with previous patches).
      
       v2:
       ---
       Fix the 'make NO_SLANG=1' error. (set __maybe_unused to
       annotation_opts in block_hists_tui_browse()).
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      848a5e50
    • A
      perf dso: Move dso_id from 'struct map' to 'struct dso' · 0e3149f8
      Arnaldo Carvalho de Melo 提交于
      And take it into account when looking up DSOs when we have the dso_id
      fields obtained from somewhere, like from PERF_RECORD_MMAP2 records.
      
      Instances of struct map pointing to the same DSO pathname but with
      anything in dso_id different are in fact different DSOs, so better have
      different 'struct dso' instances to reflect that. At some point we may
      want to get copies of the contents of the different objects if we want
      to do correct annotation or other analysis.
      
      With this we get 'struct map' 24 bytes leaner:
      
        $ pahole -C map ~/bin/perf
        struct map {
        	union {
        		struct rb_node     rb_node __attribute__((__aligned__(8))); /*     0    24 */
        		struct list_head   node;                 /*     0    16 */
        	} __attribute__((__aligned__(8)));               /*     0    24 */
        	u64                        start;                /*    24     8 */
        	u64                        end;                  /*    32     8 */
        	_Bool                      erange_warned:1;      /*    40: 0  1 */
        	_Bool                      priv:1;               /*    40: 1  1 */
      
        	/* XXX 6 bits hole, try to pack */
        	/* XXX 3 bytes hole, try to pack */
      
        	u32                        prot;                 /*    44     4 */
        	u64                        pgoff;                /*    48     8 */
        	u64                        reloc;                /*    56     8 */
        	/* --- cacheline 1 boundary (64 bytes) --- */
        	u64                        (*map_ip)(struct map *, u64); /*    64     8 */
        	u64                        (*unmap_ip)(struct map *, u64); /*    72     8 */
        	struct dso *               dso;                  /*    80     8 */
        	refcount_t                 refcnt;               /*    88     4 */
        	u32                        flags;                /*    92     4 */
      
        	/* size: 96, cachelines: 2, members: 13 */
        	/* sum members: 92, holes: 1, sum holes: 3 */
        	/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
        	/* forced alignments: 1 */
        	/* last cacheline: 32 bytes */
        } __attribute__((__aligned__(8)));
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0e3149f8
    • A
      perf map: Move maj/min/ino/ino_generation to separate struct · 99459a84
      Arnaldo Carvalho de Melo 提交于
      And this patch highlights where these fields are being used: in the sort
      order where it uses it to compare maps and classify samples taking into
      account not just the DSO, but those DSO id fields.
      
      I think these should be used to differentiate DSOs with the same name
      but different 'struct dso_id' fields, i.e. these fields should move to
      'struct dso' and then be used as part of the key when doing lookups for
      DSOs, in addition to the DSO name.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-8v5isitqy0dup47nnwkpc80f@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99459a84
  10. 12 11月, 2019 1 次提交
  11. 07 11月, 2019 5 次提交
    • J
      perf report: Sort by sampled cycles percent per block for tui · 7fa46cbf
      Jin Yao 提交于
      Previous patch has implemented a new option "--total-cycles".  But only
      stdio mode is supported.
      
      This patch supports the tui mode and support '--percent-limit'.
      
      For example,
      
       perf record -b ./div
       perf report --total-cycles --percent-limit 1
      
       # Samples: 2753248 of event 'cycles'
       Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
                26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                 5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                 4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                 4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                 3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                 3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                 3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                 2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                 2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                 2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                 2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                 1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                 1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                 1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      --------------------------------------------------
      
       v7:
       ---
       1. Since we have used use_browser in report__browse_block_hists
          to support stdio mode, now we also add supporting for tui.
      
       2. Move block tui browser code from ui/browsers/hists.c
          to block-info.c.
      
       v6:
       ---
       Create report__tui_browse_block_hists in block-info.c
       (codes are moved from builtin-report.c).
      
       v5:
       ---
       Fix a crash issue when running perf report without
       '--total-cycles'. The issue is because the internal flag
       is renamed from 'total_cycles' to 'total_cycles_mode' in
       previous patch but this patch still uses 'total_cycles'
       to check if the '--total-cycles' option is enabled, which
       causes the code to be inconsistent.
      
       v4:
       ---
       Since the block collection is moved out of printing in
       previous patch, this patch is updated accordingly for
       tui supporting.
      
       v3:
       ---
       Minor change since the function name is changed:
       block_total_cycles_percent -> block_info__total_cycles_percent
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fa46cbf
    • J
      perf report: Support --percent-limit for --total-cycles · 0b49f836
      Jin Yao 提交于
      We have already supported the '--total-cycles' option in previous patch.
      It's also useful to show entries only above a threshold percent.
      
      This patch enables '--percent-limit' for not showing entries
      under that percent.
      
      For example:
      
       perf report --total-cycles --stdio --percent-limit 1
      
       # To display the perf.data header info, please use --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 2M of event 'cycles'
       # Event count (approx.): 2753248
       #
       # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
       # ...............  ..............  ...........  ..........  .................................................................  ....................
       #
                  26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                  15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                   5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                   4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                   4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                   3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                   3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                   3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                   2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                   2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                   2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                   2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                   1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                   1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                   1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      Committer testing:
      
      From second exapmple onwards slightly edited for brevity:
      
        # perf report --total-cycles --percent-limit 2 --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
        #
        # (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
        #
        # perf report --total-cycles --percent-limit 1 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
        #
        # perf report --total-cycles --percent-limit 0.7 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
        #
      
      -------------------------------------------
      
      It only shows the entries which 'Sampled Cycles%' > 1%.
      
       v7:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v6:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v5:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v4:
       ---
       No functional change. Only fix the build issue because
       previous patches are changed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0b49f836
    • J
      perf report: Sort by sampled cycles percent per block for stdio · 6f7164fa
      Jin Yao 提交于
      It would be useful to support sorting for all blocks by the sampled
      cycles percent per block. This is useful to concentrate on the globally
      hottest blocks.
      
      This patch implements a new option "--total-cycles" which sorts all
      blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
      
       percent = block sampled cycles aggregation / total sampled cycles
      
      Note that, this patch only supports "--stdio" mode.
      
      For example,
      
        # perf record -b ./div
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 2M of event 'cycles'
        # Event count (approx.): 2753248
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                             [Program Block Range]      Shared Object
        # ...............  ..............  ...........  ..........  ................................................  .................
        #
                   26.04%            2.8M        0.40%          18                            [div.c:42 -> div.c:39]                div
                   15.17%            1.2M        0.16%           7                [random_r.c:357 -> random_r.c:380]       libc-2.27.so
                    5.11%          402.0K        0.04%           2                            [div.c:27 -> div.c:28]                div
                    4.87%          381.6K        0.04%           2                    [random.c:288 -> random.c:291]       libc-2.27.so
                    4.53%          381.0K        0.04%           2                            [div.c:40 -> div.c:40]                div
                    3.85%          300.9K        0.02%           1                            [div.c:22 -> div.c:25]                div
                    3.08%          241.1K        0.02%           1                          [rand.c:26 -> rand.c:27]       libc-2.27.so
                    3.06%          240.0K        0.02%           1                    [random.c:291 -> random.c:291]       libc-2.27.so
                    2.78%          215.7K        0.02%           1                    [random.c:298 -> random.c:298]       libc-2.27.so
                    2.52%          198.3K        0.02%           1                    [random.c:293 -> random.c:293]       libc-2.27.so
                    2.36%          184.8K        0.02%           1                          [rand.c:28 -> rand.c:28]       libc-2.27.so
                    2.33%          180.5K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.28%          176.7K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.20%          168.8K        0.02%           1                        [rand@plt+0 -> rand@plt+0]                div
                    1.98%          158.2K        0.02%           1                [random_r.c:388 -> random_r.c:388]       libc-2.27.so
                    1.57%          123.3K        0.02%           1                            [div.c:42 -> div.c:44]                div
                    1.44%          116.0K        0.42%          19                [random_r.c:357 -> random_r.c:394]       libc-2.27.so
                    0.25%          182.5K        0.02%           1                [random_r.c:388 -> random_r.c:391]       libc-2.27.so
                    0.00%              48        1.07%          48        [x86_pmu_enable+284 -> x86_pmu_enable+298]  [kernel.kallsyms]
                    0.00%              74        1.64%          74             [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92]  [kernel.kallsyms]
                    0.00%              73        1.62%          73                         [vm_mmap+0 -> vm_mmap+48]  [kernel.kallsyms]
                    0.00%              63        0.69%          31                       [up_write+0 -> up_write+34]  [kernel.kallsyms]
                    0.00%              13        0.29%          13      [setup_arg_pages+396 -> setup_arg_pages+413]  [kernel.kallsyms]
                    0.00%               3        0.07%           3      [setup_arg_pages+418 -> setup_arg_pages+450]  [kernel.kallsyms]
                    0.00%             616        6.84%         308   [security_mmap_file+0 -> security_mmap_file+72]  [kernel.kallsyms]
                    0.00%              23        0.51%          23  [security_mmap_file+77 -> security_mmap_file+87]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                  [sched_clock+0 -> sched_clock+4]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                 [sched_clock+9 -> sched_clock+12]  [kernel.kallsyms]
                    0.00%               1        0.02%           1                [rcu_nmi_exit+0 -> rcu_nmi_exit+9]  [kernel.kallsyms]
      
      Committer testing:
      
      This should provide material for hours of endless joy, both from looking
      for suspicious things in the implementation of this patch, such as the
      top one:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    2.17%            1.7M        0.08%         607   [compiler.h:199 -> common.c:221]              [kernel.vmlinux]
      
      As well from things that look legit:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    0.16%          123.0K        0.60%        4.7K   [nospec-branch.h:265 -> nospec-branch.h:278]  [kernel.vmlinux]
      
      :-)
      
      Very short system wide taken branches session:
      
        # perf record -h -b
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -b, --branch-any      sample any taken branches
      
        #
        # perf record -b
        ^C[ perf record: Woken up 595 times to write data ]
        [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
      
        #
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        #
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
                    0.56%          541.8K        0.09%         672                                        [compiler.h:199 -> common.c:300]      [kernel.vmlinux]
                    0.39%          293.2K        0.01%         104                                    [list_debug.c:43 -> list_debug.c:61]      [kernel.vmlinux]
                    0.36%          278.6K        0.03%         272                                    [entry_64.S:1289 -> entry_64.S:1308]      [kernel.vmlinux]
                    0.30%          260.8K        0.07%         564                              [clear_page_64.S:47 -> clear_page_64.S:50]      [kernel.vmlinux]
                    0.28%          215.3K        0.05%         369                                            [traps.c:623 -> traps.c:628]      [kernel.vmlinux]
                    0.23%          178.1K        0.04%         278                                      [entry_64.S:271 -> entry_64.S:275]      [kernel.vmlinux]
                    0.20%          152.6K        0.09%         706                                      [paravirt.c:177 -> paravirt.c:179]      [kernel.vmlinux]
                    0.20%          155.8K        0.05%         373                                      [entry_64.S:153 -> entry_64.S:175]      [kernel.vmlinux]
                    0.18%          136.6K        0.03%         222                                                [msr.h:105 -> msr.h:166]      [kernel.vmlinux]
                    0.16%          123.0K        0.60%        4.7K                            [nospec-branch.h:265 -> nospec-branch.h:278]      [kernel.vmlinux]
                    0.16%          118.3K        0.01%          44                                      [entry_64.S:632 -> entry_64.S:657]      [kernel.vmlinux]
                    0.14%          104.5K        0.00%          28                                          [rwsem.c:1541 -> rwsem.c:1544]      [kernel.vmlinux]
                    0.13%           99.2K        0.01%          53                                      [spinlock.c:150 -> spinlock.c:152]      [kernel.vmlinux]
                    0.13%           95.5K        0.00%          35                                              [swap.c:456 -> swap.c:471]      [kernel.vmlinux]
                    0.12%           96.2K        0.05%         407                              [copy_user_64.S:175 -> copy_user_64.S:209]      [kernel.vmlinux]
                    0.11%           85.9K        0.00%          31                                        [swap.c:400 -> page-flags.h:188]      [kernel.vmlinux]
                    0.10%           73.0K        0.01%          52                                          [paravirt.h:763 -> list.h:131]      [kernel.vmlinux]
                    0.07%           56.2K        0.03%         214                                      [filemap.c:1524 -> filemap.c:1557]      [kernel.vmlinux]
                    0.07%           54.2K        0.02%         145                                        [memory.c:1032 -> memory.c:1049]      [kernel.vmlinux]
                    0.07%           50.3K        0.00%          39                                            [mmzone.c:49 -> mmzone.c:69]      [kernel.vmlinux]
                    0.06%           48.3K        0.01%          40                                   [paravirt.h:768 -> page_alloc.c:3304]      [kernel.vmlinux]
                    0.06%           46.7K        0.02%         155                                        [memory.c:1032 -> memory.c:1056]      [kernel.vmlinux]
                    0.06%           46.9K        0.01%         103                                              [swap.c:867 -> swap.c:902]      [kernel.vmlinux]
                    0.06%           47.8K        0.00%          34                                    [entry_64.S:1201 -> entry_64.S:1202]      [kernel.vmlinux]
      
       -----------------------------------------------------------
      
       v7:
       ---
       Use use_browser in report__browse_block_hists for supporting
       stdio and potential tui mode.
      
       v6:
       ---
       Create report__browse_block_hists in block-info.c (codes are
       moved from builtin-report.c). It's called from
       perf_evlist__tty_browse_hists.
      
       v5:
       ---
       1. Move all block functions to block-info.c
      
       2. Move the code of setting ms in block hist_entry to
          other patch.
      
       v4:
       ---
       1. Use new option '--total-cycles' to replace
          '-s total_cycles' in v3.
      
       2. Move block info collection out of block info
          printing.
      
       v3:
       ---
       1. Use common function block_info__process_sym to
          process the blocks per symbol.
      
       2. Remove the nasty hack for skipping calculation
          of column length
      
       3. Some minor cleanup
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f7164fa
    • J
      perf hist: Count the total cycles of all samples · 7841f40a
      Jin Yao 提交于
      We can get the per sample cycles by hist__account_cycles(). It's also
      useful to know the total cycles of all samples in order to get the
      cycles coverage for a single program block in further. For example:
      
        coverage = per block sampled cycles / total sampled cycles
      
      This patch creates a new argument 'total_cycles' in hist__account_cycles(),
      which will be added with the cycles of each sample.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7841f40a
    • A
      perf maps: Add for_each_entry()/_safe() iterators · 8efc4f05
      Arnaldo Carvalho de Melo 提交于
      To reduce boilerplate, provide a more compact form using an idiom
      present in other trees of data structures.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8efc4f05
  12. 15 10月, 2019 1 次提交
  13. 21 9月, 2019 1 次提交
  14. 20 9月, 2019 1 次提交
  15. 01 9月, 2019 4 次提交
  16. 29 8月, 2019 2 次提交
  17. 26 8月, 2019 1 次提交
  18. 20 8月, 2019 1 次提交
    • A
      perf report: Prefer DWARF callstacks to LBR ones when captured both · 10ccbc1c
      Alexey Budankov 提交于
      Display DWARF based callchains when the perf.data file contains raw thread
      stack data as LBR callstack data.
      
      Commiter testing:
      
      This changes the output from the branch stack based one, i.e. without
      this patch, for the same file as in the previous csets:
      
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 13
        #
        # Overhead  Command  Source Shared Object  Source Symbol                Target Symbol                              Basic Block Cycles
        # ........  .......  ....................  ...........................  .........................................  ..................
        #
             7.69%  ls       libpthread-2.29.so    [.] _init                    [.] __pthread_initialize_minimal_internal  6827
             7.69%  ls       ld-2.29.so            [k] _start                   [k] _dl_start                              -
             7.69%  ls       ld-2.29.so            [.] _dl_start_user           [.] _dl_init                               -24790
             7.69%  ls       ld-2.29.so            [k] _dl_start                [k] _dl_sysdep_start                       278
             7.69%  ls       ld-2.29.so            [k] dl_main                  [k] _dl_map_object_deps                    15581
             7.69%  ls       ld-2.29.so            [k] open_verify.constprop.0  [k] lseek64                                4228
             7.69%  ls       ld-2.29.so            [k] _dl_map_object           [k] open_verify.constprop.0                55
             7.69%  ls       ld-2.29.so            [k] openaux                  [k] _dl_map_object                         67
             7.69%  ls       ld-2.29.so            [k] _dl_map_object_deps      [k] 0x00007f441b57c090                     112
             7.69%  ls       ld-2.29.so            [.] call_init.part.0         [.] _init                                  334
             7.69%  ls       ld-2.29.so            [.] _dl_init                 [.] call_init.part.0                       383
             7.69%  ls       ld-2.29.so            [k] _dl_sysdep_start         [k] dl_main                                45
             7.69%  ls       ld-2.29.so            [k] _dl_catch_exception      [k] openaux                                116
      
        #
        # (Tip: For memory address profiling, try: perf mem record / perf mem report)
        #
      
      To the one that shows call chains:
      
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 10  of event 'cycles'
        # Event count (approx.): 3204047
        #
        # Children      Self  Command  Shared Object       Symbol
        # ........  ........  .......  ..................  .........................................
        #
            55.01%     0.00%  ls       [kernel.vmlinux]    [k] entry_SYSCALL_64_after_hwframe
                    |
                    ---entry_SYSCALL_64_after_hwframe
                       do_syscall_64
                       |
                        --16.01%--__x64_sys_execve
                                  __do_execve_file.isra.0
                                  search_binary_handler
                                  load_elf_binary
                                  elf_map
                                  vm_mmap_pgoff
                                  do_mmap
                                  mmap_region
                                  perf_event_mmap
                                  perf_iterate_sb
                                  perf_iterate_ctx
                                  perf_event_mmap_output
                                  perf_output_copy
                                  memcpy_erms
      
            55.01%    39.00%  ls       [kernel.vmlinux]    [k] do_syscall_64
                    |
                    |--39.00%--0xffffffffffffffff
                    |          _dl_map_object
                    |          open_verify.constprop.0
                    |          __lseek64 (inlined)
                    |          entry_SYSCALL_64_after_hwframe
                    |          do_syscall_64
                    |
                     --16.01%--do_syscall_64
                               __x64_sys_execve
                               __do_execve_file.isra.0
                               search_binary_handler
                               load_elf_binary
                               elf_map
                               vm_mmap_pgoff
                               do_mmap
                               mmap_region
                               perf_event_mmap
                               perf_iterate_sb
                               perf_iterate_ctx
                               perf_event_mmap_output
                               perf_output_copy
                               memcpy_erms
      
            42.95%    42.95%  ls       libpthread-2.29.so  [.] __pthread_initialize_minimal_internal
                    |
                    ---_init
                       __pthread_initialize_minimal_internal
      
            42.95%     0.00%  ls       libpthread-2.29.so  [.] _init
                    |
                    ---_init
                       __pthread_initialize_minimal_internal
      
        <SNIP>
      
        #
        # (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
        #
        #
      
      The branch stack view be explicitely selected using:
      
        # perf report -h branch-stack
      
         Usage: perf report [<options>]
      
            -b, --branch-stack    use branch records for per branch histogram filling
      
        #
      
      I.e. after this patch:
      
        # perf report -b --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 13
        #
        # Overhead  Command  Source Shared Object  Source Symbol                Target Symbol                              Basic Block Cycles
        # ........  .......  ....................  ...........................  .........................................  ..................
        #
             7.69%  ls       libpthread-2.29.so    [.] _init                    [.] __pthread_initialize_minimal_internal  6827
             7.69%  ls       ld-2.29.so            [k] _start                   [k] _dl_start                              -
             7.69%  ls       ld-2.29.so            [.] _dl_start_user           [.] _dl_init                               -24790
             7.69%  ls       ld-2.29.so            [k] _dl_start                [k] _dl_sysdep_start                       278
             7.69%  ls       ld-2.29.so            [k] dl_main                  [k] _dl_map_object_deps                    15581
             7.69%  ls       ld-2.29.so            [k] open_verify.constprop.0  [k] lseek64                                4228
             7.69%  ls       ld-2.29.so            [k] _dl_map_object           [k] open_verify.constprop.0                55
             7.69%  ls       ld-2.29.so            [k] openaux                  [k] _dl_map_object                         67
             7.69%  ls       ld-2.29.so            [k] _dl_map_object_deps      [k] 0x00007f441b57c090                     112
             7.69%  ls       ld-2.29.so            [.] call_init.part.0         [.] _init                                  334
             7.69%  ls       ld-2.29.so            [.] _dl_init                 [.] call_init.part.0                       383
             7.69%  ls       ld-2.29.so            [k] _dl_sysdep_start         [k] dl_main                                45
             7.69%  ls       ld-2.29.so            [k] _dl_catch_exception      [k] openaux                                116
      
        #
        # (Tip: Show current config key-value pairs: perf config --list)
        #
        #
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/ccbd9583-82f4-dec5-7e84-64bf56e351fb@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      10ccbc1c
  19. 16 8月, 2019 1 次提交
    • A
      perf report: Add --switch-on/--switch-off events · ef4b1a53
      Arnaldo Carvalho de Melo 提交于
      Since 'perf top' shares the histogram browser with 'perf report', then
      the same explanation in the previous cset applies.
      
      An additional example uses a pair of SDT events available for systemtap:
      
        # perf probe --exec=/usr/bin/stap '%*:*'
        Added new events:
          sdt_stap:benchmark__thread__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark   (on %* in /usr/bin/stap)
          sdt_stap:benchmark__thread__end (on %* in /usr/bin/stap)
          sdt_stap:pass6__start (on %* in /usr/bin/stap)
          sdt_stap:pass6__end  (on %* in /usr/bin/stap)
          sdt_stap:pass5__start (on %* in /usr/bin/stap)
          sdt_stap:pass5__end  (on %* in /usr/bin/stap)
          sdt_stap:pass0__start (on %* in /usr/bin/stap)
          sdt_stap:pass0__end  (on %* in /usr/bin/stap)
          sdt_stap:pass1a__start (on %* in /usr/bin/stap)
          sdt_stap:pass1b__start (on %* in /usr/bin/stap)
          sdt_stap:pass1__end  (on %* in /usr/bin/stap)
          sdt_stap:pass2__start (on %* in /usr/bin/stap)
          sdt_stap:pass2__end  (on %* in /usr/bin/stap)
          sdt_stap:pass3__start (on %* in /usr/bin/stap)
          sdt_stap:pass3__end  (on %* in /usr/bin/stap)
          sdt_stap:pass4__start (on %* in /usr/bin/stap)
          sdt_stap:pass4__end  (on %* in /usr/bin/stap)
          sdt_stap:benchmark__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark__end (on %* in /usr/bin/stap)
          sdt_stap:cache__get  (on %* in /usr/bin/stap)
          sdt_stap:cache__clean (on %* in /usr/bin/stap)
          sdt_stap:cache__add__module (on %* in /usr/bin/stap)
          sdt_stap:cache__add__source (on %* in /usr/bin/stap)
          sdt_stap:stap_system__complete (on %* in /usr/bin/stap)
          sdt_stap:stap_system__start (on %* in /usr/bin/stap)
          sdt_stap:stap_system__spawn (on %* in /usr/bin/stap)
          sdt_stap:stap_system__fork (on %* in /usr/bin/stap)
          sdt_stap:intern_string (on %* in /usr/bin/stap)
          sdt_stap:client__start (on %* in /usr/bin/stap)
          sdt_stap:client__end (on %* in /usr/bin/stap)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e sdt_stap:client__end -aR sleep 1
      
        #
      
      From these we're use the two below to run systemtap's test suite:
      
        # perf record -e sdt_stap:pass2__*,cycles:P make installcheck > /dev/null
        ^C[ perf record: Woken up 8 times to write data ]
        [ perf record: Captured and wrote 2.691 MB perf.data (39638 samples) ]
        Terminated
        # perf script | grep sdt_stap
                    stap 28979 [000] 19424.302660: sdt_stap:pass2__start: (561b9a537de3) arg1=140730364262544
                    stap 28979 [000] 19424.333083:   sdt_stap:pass2__end: (561b9a53a9e1) arg1=140730364262544
                    stap 29045 [006] 19424.933460: sdt_stap:pass2__start: (563edddcede3) arg1=140722674883152
                    stap 29045 [006] 19424.963794:   sdt_stap:pass2__end: (563edddd19e1) arg1=140722674883152
        # perf script | grep cycles |  wc -l
        39634
        #
      
      Looking at the whole perf.data file:
      
        [root@quaco testsuite]# perf report | grep cycles:P -A25
        # Samples: 39K of event 'cycles:P'
        # Event count (approx.): 34044267368
        #
        # Overhead  Command  Shared Object         Symbol
        # ........  .......  ....................  ................................
        #
             3.50%  cc1      cc1                   [.] ht_lookup_with_hash
             3.04%  cc1      cc1                   [.] _cpp_lex_token
             2.11%  cc1      cc1                   [.] ggc_internal_alloc
             1.83%  cc1      cc1                   [.] cpp_get_token_with_location
             1.68%  cc1      libc-2.29.so          [.] _int_malloc
             1.41%  cc1      cc1                   [.] linemap_position_for_column
             1.25%  cc1      cc1                   [.] ggc_internal_cleared_alloc
             1.20%  cc1      cc1                   [.] c_lex_with_flags
             1.18%  cc1      cc1                   [.] get_combined_adhoc_loc
             1.05%  cc1      libc-2.29.so          [.] malloc
             1.01%  cc1      libc-2.29.so          [.] _int_free
             0.96%  stap     stap                  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, stringtable_hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > > >
             0.78%  stap     stap                  [.] lexer::scan
             0.74%  cc1      cc1                   [.] _cpp_lex_direct
             0.70%  cc1      cc1                   [.] pop_scope
             0.70%  cc1      cc1                   [.] c_parser_declspecs
             0.69%  stap     libc-2.29.so          [.] _int_malloc
             0.68%  cc1      cc1                   [.] htab_find_slot
             0.68%  cc1      [kernel.vmlinux]      [k] prepare_exit_to_usermode
             0.64%  cc1      [kernel.vmlinux]      [k] clear_page_erms
        [root@quaco testsuite]#
      
      And now only what happens in slices demarcated by those start/end SDT
      events:
      
        [root@quaco testsuite]# perf report --switch-on=sdt_stap:pass2__start --switch-off=sdt_stap:pass2__end | grep cycles:P -A100
        # Samples: 240  of event 'cycles:P'
        # Event count (approx.): 206491934
        #
        # Overhead  Command  Shared Object        Symbol
        # ........  .......  ...................  ................................................
        #
            38.99%  stap     stap                 [.] systemtap_session::register_library_aliases
            19.47%  stap     stap                 [.] match_key::operator<
            15.01%  stap     libc-2.29.so         [.] __memcmp_avx2_movbe
             5.19%  stap     libc-2.29.so         [.] _int_malloc
             2.50%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_insert_and_rebalance
             2.30%  stap     stap                 [.] match_node::build_no_more
             2.07%  stap     libc-2.29.so         [.] malloc
             1.66%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::find
             1.66%  stap     stap                 [.] match_node::bind
             1.58%  stap     [kernel.vmlinux]     [k] prepare_exit_to_usermode
             1.17%  stap     [kernel.vmlinux]     [k] native_irq_return_iret
             0.87%  stap     stap                 [.] 0x0000000000032ec4
             0.77%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_increment
             0.47%  stap     stap                 [.] std::vector<derived_probe_builder*, std::allocator<derived_probe_builder*> >::_M_realloc_insert<derived_probe_builder* const&>
             0.47%  stap     [kernel.vmlinux]     [k] get_page_from_freelist
             0.47%  stap     [kernel.vmlinux]     [k] swapgs_restore_regs_and_return_to_usermode
             0.47%  stap     [kernel.vmlinux]     [k] do_user_addr_fault
             0.46%  stap     [kernel.vmlinux]     [k] __pagevec_lru_add_fn
             0.46%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::_M_emplace_unique<std::pair<match_key, match_node*> >
             0.42%  stap     libstdc++.so.6.0.26  [.] 0x00000000000c18fa
             0.40%  stap     [kernel.vmlinux]     [k] interrupt_entry
             0.40%  stap     [kernel.vmlinux]     [k] update_load_avg
             0.40%  stap     [kernel.vmlinux]     [k] __intel_pmu_disable_all
             0.40%  stap     [kernel.vmlinux]     [k] clear_page_erms
             0.39%  stap     [kernel.vmlinux]     [k] __mod_node_page_state
             0.39%  stap     [kernel.vmlinux]     [k] error_entry
             0.39%  stap     [kernel.vmlinux]     [k] sync_regs
             0.38%  stap     [kernel.vmlinux]     [k] __handle_mm_fault
             0.38%  stap     stap                 [.] derive_probes
      
        #
        # (Tip: System-wide collection from all CPUs: perf record -a)
        #
        [root@quaco testsuite]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: William Cohen <wcohen@redhat.com>
      Link: https://lkml.kernel.org/n/tip-408hvumcnyn93a0auihnawew@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef4b1a53
  20. 30 7月, 2019 3 次提交
  21. 09 7月, 2019 2 次提交
  22. 26 6月, 2019 2 次提交