1. 25 10月, 2018 5 次提交
    • A
      perf script: Support total cycles count · fe57120e
      Andi Kleen 提交于
      For 'perf script' brstackinsn also print a running cycles count.  This
      makes it easier to calculate cycle deltas for code sections measured
      with LBRs.
      
      % perf record -b -a sleep 1
      % perf script -F +brstackinsn
      ...
              00007f73ecc41083        insn: 74 06                     # PRED 9 cycles [17] 1.11 IPC
              00007f73ecc4108b        insn: a8 10
              00007f73ecc4108d        insn: 74 71                     # PRED 1 cycles [18] 1.00 IPC
              00007f73ecc41100        insn: 48 8b 46 10
              00007f73ecc41104        insn: 4c 8b 38
              00007f73ecc41107        insn: 4d 85 ff
              00007f73ecc4110a        insn: 0f 84 b0 00 00 00
              00007f73ecc41110        insn: 83 43 58 01
              00007f73ecc41114        insn: 48 89 df
              00007f73ecc41117        insn: e8 94 73 04 00            # PRED 6 cycles [24] 1.00 IPC
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Link: http://lkml.kernel.org/r/20180924170732.GA28040@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fe57120e
    • A
      perf script: Implement --graph-function · 99f753f0
      Andi Kleen 提交于
      Add a ftrace style --graph-function argument to 'perf script' that
      allows to print itrace function calls only below a given function. This
      makes it easier to find the code of interest in a large trace.
      
      % perf record -e intel_pt//k -a sleep 1
      % perf script --graph-function group_sched_in --call-trace
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          calc_timer_values
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_cpu
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          arch_perf_update_userpage
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              __fentry__
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              using_native_sched_clock
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_stable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_enable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])          group_sched_in
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              event_sched_in.isra.107
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_event_set_state.part.71
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_time
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_pmu_disable
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_log_itrace_start
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_userpage
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          calc_timer_values
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_cpu
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          arch_perf_update_userpage
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              __fentry__
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              using_native_sched_clock
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_stable
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180920180540.14039-5-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99f753f0
    • A
      tools script: Add --call-trace and --call-ret-trace · d1b1552e
      Andi Kleen 提交于
      Add short cut options to print PT call trace and call-ret-trace, for
      calls and call and returns. Roughly corresponds to ftrace function
      tracer and function graph tracer.
      
      Just makes these common use cases nicer to use.
      
      % perf record -a -e intel_pt// sleep 1
      % perf script --call-trace
      	    perf   900 [000] 194167.205652203: ([kernel.kallsyms])          perf_pmu_enable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          event_filter_match
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
      
      % perf script --call-ret-trace
      	    perf   900 [000] 194167.205652203:   tr strt     ([unknown])        pt_config
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_config
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_event_add
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            perf_pmu_enable
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_void
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_sched_in.isra.107
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_int
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            group_sched_in
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            event_filter_match
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_filter_match
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            group_sched_in
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])                perf_pmu_nop_txn
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                event_sched_in.isra.107
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                    perf_event_set_state.part.71
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180920180540.14039-4-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d1b1552e
    • A
      perf script: Make itrace script default to all calls · 4eb06815
      Andi Kleen 提交于
      By default 'perf script' for itrace outputs sampled instructions or
      branches. In my experience this is confusing to users because it's hard
      to correlate with real program behavior. The sampling makes sense for
      tools like 'perf report' that actually sample to reduce the run time,
      but run time is normally not a problem for 'perf script'.  It's better
      to give an accurate representation of the program flow.
      
      Default 'perf script' to output all calls for itrace. That's a much saner
      default. The old behavior can be still requested with 'perf script'
      --itrace=ibxwpe100000
      
      v2: Fix ETM build failure
      v3: Really fix ETM build failure (Kim Phillips)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Link: http://lkml.kernel.org/r/20180920180540.14039-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4eb06815
    • A
      perf script: Add --insn-trace for instruction decoding · b585ebdb
      Andi Kleen 提交于
      Add a --insn-trace short hand option for decoding and disassembling
      instruction streams for intel_pt. This automatically pipes the output
      into the xed disassembler to generate disassembled instructions.  This
      just makes this use model much nicer to use.
      
      Before
      
        % perf record -e intel_pt// ...
        % perf script --itrace=i0ns --ns -F +insn,-event,-period | xed -F insn: -A -64
         swapper 0 [000] 17276.429606186: ffffffff81010486 pt_config ([kernel.kallsyms])    nopl  %eax, (%rax,%rax,1)
         swapper 0 [000] 17276.429606186: ffffffff8101048b pt_config ([kernel.kallsyms])    add $0x10, %rsp
         swapper 0 [000] 17276.429606186: ffffffff8101048f pt_config ([kernel.kallsyms])    popq  %rbx
         swapper 0 [000] 17276.429606186: ffffffff81010490 pt_config ([kernel.kallsyms])    popq  %rbp
         swapper 0 [000] 17276.429606186: ffffffff81010491 pt_config ([kernel.kallsyms])    popq  %r12
         swapper 0 [000] 17276.429606186: ffffffff81010493 pt_config ([kernel.kallsyms])    popq  %r13
         swapper 0 [000] 17276.429606186: ffffffff81010495 pt_config ([kernel.kallsyms])    popq  %r14
         swapper 0 [000] 17276.429606186: ffffffff81010497 pt_config ([kernel.kallsyms])    popq  %r15
         swapper 0 [000] 17276.429606186: ffffffff81010499 pt_config ([kernel.kallsyms])    retq
         swapper 0 [000] 17276.429606186: ffffffff8101063e pt_event_add ([kernel.kallsyms])         cmpl  $0x1, 0x1b0(%rbx)
         swapper 0 [000] 17276.429606186: ffffffff81010645 pt_event_add ([kernel.kallsyms])         mov $0xffffffea, %eax
         swapper 0 [000] 17276.429606186: ffffffff8101064a pt_event_add ([kernel.kallsyms])         mov $0x0, %edx
         swapper 0 [000] 17276.429606186: ffffffff8101064f pt_event_add ([kernel.kallsyms])         popq  %rbx
         swapper 0 [000] 17276.429606186: ffffffff81010650 pt_event_add ([kernel.kallsyms])         cmovnz %edx, %eax
         swapper 0 [000] 17276.429606186: ffffffff81010653 pt_event_add ([kernel.kallsyms])         jmp 0xffffffff81010635
         swapper 0 [000] 17276.429606186: ffffffff81010635 pt_event_add ([kernel.kallsyms])         retq
         swapper 0 [000] 17276.429606186: ffffffff8115e687 event_sched_in.isra.107 ([kernel.kallsyms])       test %eax, %eax
      
      Now:
      
        % perf record -e intel_pt// ...
        % perf script --insn-trace --xed
        ... same output ...
      
      XED needs to be installed with:
      
        $ git clone https://github.com/intelxed/mbuild.git mbuild
        $ git clone https://github.com/intelxed/xed
        $ cd xed
        $ ./mfile.py
        $ ./mfile.py examples
        $ sudo ./mfile.py --prefix=/usr/local install
        $ sudo cp obj/examples/xed /usr/local/bin
        $ xed | head -3
        ERROR: required argument(s) were missing
        Copyright (C) 2017, Intel Corporation. All rights reserved.
        XED version: [v10.0-328-g7d62c8c49b7b]
        $
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20180920180540.14039-2-andi@firstfloor.org
      [ Fixed up whitespace damage, added the 'mfile.py examples + cp obj/examples/xed ... ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b585ebdb
  2. 23 10月, 2018 1 次提交
    • M
      perf script: Flush output stream after events in verbose mode · 7ee40678
      Milian Wolff 提交于
      When the perf script output is written to a terminal stream, the normal
      output of `perf script` would get buffered, but its debug output would
      be written directly. This made it quite hard to figure out where a given
      debug output is coming from.
      
      We can improve on this by flushing the output buffer after processing an
      event. To see the value, compare the following output for a `perf script
      -v` run:
      
      Before this patch:
      ```
      unwind: reg 16, val 7faf7dfdc000
      unwind: reg 7, val 7ffc80811e30
      unwind: find_proc_info dso /usr/lib/ld-2.28.so
      unwind: reg 6, val 0
      unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
      unwind: reg 16, val 7faf7dfdc000
      unwind: reg 7, val 7ffc80811e30
      unwind: find_proc_info dso /usr/lib/ld-2.28.so
      unwind: reg 6, val 0
      unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
      unwind: reg 16, val 7faf7dfdc000
      unwind: reg 7, val 7ffc80811e30
      unwind: find_proc_info dso /usr/lib/ld-2.28.so
      unwind: reg 6, val 0
      unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
      unwind: reg 16, val 7faf7dfdc000
      unwind: reg 7, val 7ffc80811e30
      ... lots and lots of verbose debug output
      cpp-inlining 24617 90229.122036534:          1 cycles:uppp:
                  7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
      
      cpp-inlining 24617 90229.122043974:          1 cycles:uppp:
                  7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
      ...
      ```
      
      After this patch:
      ```
      ...
      unwind: reg 16, val 7faf7dfdc000
      unwind: reg 7, val 7ffc80811e30
      unwind: find_proc_info dso /usr/lib/ld-2.28.so
      unwind: reg 6, val 0
      unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
      cpp-inlining 24617 90229.122036534:          1 cycles:uppp:
                  7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
      
      unwind: reg 16, val 7faf7dfdc000
      unwind: reg 7, val 7ffc80811e30
      unwind: find_proc_info dso /usr/lib/ld-2.28.so
      unwind: reg 6, val 0
      unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
      cpp-inlining 24617 90229.122043974:          1 cycles:uppp:
                  7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
      ...
      ```
      
      This new output format makes it much easier to use perf script output
      for debugging purposes, e.g. to investigate broken dwarf unwinding.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181021191424.16183-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7ee40678
  3. 22 10月, 2018 1 次提交
  4. 20 9月, 2018 4 次提交
    • A
      perf script: Enhance sample flags for trace begin / end · 62cb1b88
      Adrian Hunter 提交于
      Allow for different combinations of sample flags with "trace begin" or
      "trace end".
      
      Previously, the Intel PT decoder would indicate begin / end by a branch
      from / to zero. That hides useful information, in particular when a
      trace ends with a call. Before remedying that, prepare 'perf script' to
      display sample flags with more combinations that include trace begin /
      end. In those cases display 'tr start' and 'tr end' separately.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20180920130048.31432-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      62cb1b88
    • A
      perf script: Print DSO for callindent · a78cdee6
      Andi Kleen 提交于
      Now that we don't need to print the IP/ADDR for callindent the DSO is
      also not printed. It's useful for some cases, so add an own DSO printout
      for callindent for the case when IP/ADDR is not enabled.
      
      Before:
      
      % perf script --itrace=cr -F +callindent,-ip,-sym,-symoff,-addr
               swapper     0 [000]  3377.917072:          1   branches: pt_config
               swapper     0 [000]  3377.917072:          1   branches:     pt_config
               swapper     0 [000]  3377.917072:          1   branches:     pt_event_add
               swapper     0 [000]  3377.917072:          1   branches:     perf_pmu_enable
               swapper     0 [000]  3377.917072:          1   branches:     perf_pmu_nop_void
               swapper     0 [000]  3377.917072:          1   branches:     event_sched_in.isra.107
               swapper     0 [000]  3377.917072:          1   branches:     __x86_indirect_thunk_rax
               swapper     0 [000]  3377.917072:          1   branches:     perf_pmu_nop_int
               swapper     0 [000]  3377.917072:          1   branches:     group_sched_in
               swapper     0 [000]  3377.917072:          1   branches:     event_filter_match
               swapper     0 [000]  3377.917072:          1   branches:     event_filter_match
               swapper     0 [000]  3377.917072:          1   branches:     group_sched_in
      
      After:
      
               swapper     0 [000]  3377.917072:          1   branches: ([unknown])   pt_config
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       pt_config
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       pt_event_add
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       perf_pmu_enable
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       perf_pmu_nop_void
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       event_sched_in.isra.107
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       __x86_indirect_thunk_rax
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       perf_pmu_nop_int
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       group_sched_in
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       event_filter_match
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       event_filter_match
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])       group_sched_in
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])           __x86_indirect_thunk_rax
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])           perf_pmu_nop_txn
               swapper     0 [000]  3377.917072:          1   branches: ([kernel.kallsyms])           event_sched_in.isra.107
      
      (in the kernel case of course it's not very useful, but it's important
      with user programs where symbols are not unique)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180918123214.26728-6-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a78cdee6
    • A
      perf script: Allow sym and dso without ip, addr · 37fed3de
      Andi Kleen 提交于
      Currently sym and dso require printing ip and addr because the print
      function is tied to those outputs. With callindent it makes sense to
      print the symbol or dso without numerical IP or ADDR. So change the
      dependency check to only check the underlying attribute.
      
      Also the branch target output relies on the user_set flag to determine
      if the branch target should be implicitely printed. When modifying the
      fields with + or - also set user_set, so that ADDR can be removed. We
      also need to set wildcard_set to make the initial sanity check pass.
      
      This allows to remove a lot of noise in callindent output by dropping
      the numerical addresses, which are not all that useful.
      
      Before
      
      % perf script --itrace=cr -F +callindent
               swapper     0 [000] 156546.354971:          1   branches: pt_config                                       0 [unknown] ([unknown]) => ffffffff81010486 pt_config ([kernel.kallsyms])
               swapper     0 [000] 156546.354971:          1   branches:     pt_config                    ffffffff81010499 pt_config ([kernel.kallsyms]) => ffffffff8101063e pt_event_add ([kernel.kallsyms])
               swapper     0 [000] 156546.354971:          1   branches:     pt_event_add                 ffffffff81010635 pt_event_add ([kernel.kallsyms]) => ffffffff8115e687 event_sched_in.isra.107 ([kernel.kallsyms])
               swapper     0 [000] 156546.354971:          1   branches:     perf_pmu_enable              ffffffff8115e726 event_sched_in.isra.107 ([kernel.kallsyms]) => ffffffff811579b0 perf_pmu_enable ([kernel.kallsyms])
               swapper     0 [000] 156546.354971:          1   branches:     perf_pmu_nop_void            ffffffff81151730 perf_pmu_nop_void ([kernel.kallsyms]) => ffffffff8115e72b event_sched_in.isra.107 ([kernel.kallsyms])
               swapper     0 [000] 156546.354971:          1   branches:     event_sched_in.isra.107      ffffffff8115e737 event_sched_in.isra.107 ([kernel.kallsyms]) => ffffffff8115e7a5 group_sched_in ([kernel.kallsyms])
               swapper     0 [000] 156546.354971:          1   branches:     __x86_indirect_thunk_rax     ffffffff8115e7f6 group_sched_in ([kernel.kallsyms]) => ffffffff81a03000 __x86_indirect_thunk_rax ([kernel.kallsyms])
      
      After
      
      % perf script --itrace=cr -F +callindent,-ip,-sym,-symoff
             swapper     0 [000] 156546.354971:          1   branches:  pt_config
               swapper     0 [000] 156546.354971:          1   branches:      pt_config
               swapper     0 [000] 156546.354971:          1   branches:      pt_event_add
               swapper     0 [000] 156546.354971:          1   branches:       perf_pmu_enable
               swapper     0 [000] 156546.354971:          1   branches:       perf_pmu_nop_void
               swapper     0 [000] 156546.354971:          1   branches:       event_sched_in.isra.107
               swapper     0 [000] 156546.354971:          1   branches:       __x86_indirect_thunk_rax
               swapper     0 [000] 156546.354971:          1   branches:       perf_pmu_nop_int
               swapper     0 [000] 156546.354971:          1   branches:       group_sched_in
               swapper     0 [000] 156546.354971:          1   branches:       event_filter_match
               swapper     0 [000] 156546.354971:          1   branches:       event_filter_match
               swapper     0 [000] 156546.354971:          1   branches:       group_sched_in
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180918123214.26728-5-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      37fed3de
    • A
      perf tools: Report itrace options in help · c12e039d
      Andi Kleen 提交于
      I often forget all the options that --itrace accepts. Instead of burying
      them in the man page only report them in the normal command line help
      too to make them easier accessible.
      
      v2: Align
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180914031038.4160-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c12e039d
  5. 19 9月, 2018 1 次提交
  6. 31 8月, 2018 1 次提交
  7. 14 8月, 2018 1 次提交
  8. 25 6月, 2018 3 次提交
  9. 09 6月, 2018 1 次提交
    • S
      perf script: Show hw-cache events · fad76d43
      Seeteena Thoufeek 提交于
      'perf script' fails to report hardware cache events (PERF_TYPE_HW_CACHE)
      where as 'perf report' shows the samples. Fix it. Ex,
      
        # perf record -e L1-dcache-loads ./a.out
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.008 MB perf.data (11 samples)]
      
      Before patch:
      
        # perf script | wc -l
        0
      
      After patch:
      
        # perf script | wc -l
        11
      
      Committer testing:
      
        [root@jouet ~]# perf script | head -30 | tail
              Timer 9803 [2] 8.963330:  1554 L1-dcache-loads: 7ffef89baae4 __vdso_clock_gettime+0xf4 ([vdso])
            swapper    0 [2] 8.963343:  5626 L1-dcache-loads: ffffffffa66f4f6b cpuidle_not_av+0xb (/lib/modules/4.17.0-rc5/build/vmlinux)
            firefox 4853 [2] 8.964070: 18935 L1-dcache-loads: 7f0b9a00dc30 xcb_poll_for_event+0x0 (/usr/lib64/libxcb.so.1.1.0)
        Softwar~cTh 4928 [2] 8.964548: 15928 L1-dcache-loads: ffffffffa60d795c update_curr+0x10c (/lib/modules/4.17.0-rc5/build/vmlinux)
            firefox 4853 [2] 8.964675: 14978 L1-dcache-loads: ffffffffa6897018 mutex_unlock+0x18 (/lib/modules/4.17.0-rc5/build/vmlinux)
        gnome-shell 2026 [3] 8.964693: 50670 L1-dcache-loads: 7fa08854de6d g_source_iter_next+0x6d (/usr/lib64/libglib-2.0.so.0.5400.3)
         Compositor 4929 [1] 8.964784: 71772 L1-dcache-loads: 7f0b936bf078 [unknown] (/usr/lib64/firefox/libxul.so)
           Xwayland 2096 [2] 8.964919: 16799 L1-dcache-loads: 7f68ce2fcb8a glXGetCurrentContext+0x1a (/usr/lib64/libGLX.so.0.0.0)
        gnome-shell 2026 [3] 8.964997: 50670 L1-dcache-loads: 7fa08854de6d g_source_iter_next+0x6d (/usr/lib64/libglib-2.0.so.0.5400.3)
        [root@jouet ~]#
      Signed-off-by: NSeeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1528455748-20087-1-git-send-email-s1seetee@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fad76d43
  10. 05 6月, 2018 2 次提交
  11. 19 5月, 2018 1 次提交
    • S
      perf script: Show symbol offsets by default · 7903a708
      Sandipan Das 提交于
      Since the ip shown for a symbol is now always a virtual address, it
      becomes difficult to correlate this with objdump output and determine
      the exact instruction address. So, we always show the offset from the
      start of the symbol.
      
      This can be verified on a powerpc64le system running Fedora 27 as
      follows:
      
        # perf probe -a sys_write
        # perf record -e probe:sys_write -g ~/test
      
      Before applying this patch:
      
        # perf script
      
        test  9710 [013] 95614.332431: probe:sys_write: (c0000000004025b0)
                c0000000004025b0 sys_write (/lib/modules/4.17.0-rc4+/build/vmlinux)
                c00000000000b9e0 system_call (/lib/modules/4.17.0-rc4+/build/vmlinux)
                    7fffb70d8234 __GI___libc_write (/usr/lib64/libc-2.26.so)
                    7fffb7052c74 _IO_file_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
                        5afc1818 [unknown] ([unknown])
                    7fffb7051a60 new_do_write (/usr/lib64/libc-2.26.so)
                    7fffb7054638 _IO_do_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
                    7fffb7054bbc _IO_file_overflow@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
                    7fffb7055a24 __overflow (/usr/lib64/libc-2.26.so)
                    7fffb7044548 _IO_puts (/usr/lib64/libc-2.26.so)
                        10000440 main (/home/sandipan/test)
                    7fffb6fe36a0 generic_start_main.isra.0 (/usr/lib64/libc-2.26.so)
                    7fffb6fe3898 __libc_start_main (/usr/lib64/libc-2.26.so)
                               0 [unknown] ([unknown])
        ...
      
      After applying this patch:
      
        # perf script
      
        test  9710 [013] 95614.332431: probe:sys_write: (c0000000004025b0)
                c0000000004025b0 sys_write+0x10 (/lib/modules/4.17.0-rc4+/build/vmlinux)
                c00000000000b9e0 system_call+0x58 (/lib/modules/4.17.0-rc4+/build/vmlinux)
                    7fffb70d8234 __GI___libc_write+0x24 (/usr/lib64/libc-2.26.so)
                    7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44 (/usr/lib64/libc-2.26.so)
                        5afc1818 [unknown] ([unknown])
                    7fffb7051a60 new_do_write+0x90 (/usr/lib64/libc-2.26.so)
                    7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38 (/usr/lib64/libc-2.26.so)
                    7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c (/usr/lib64/libc-2.26.so)
                    7fffb7055a24 __overflow+0x64 (/usr/lib64/libc-2.26.so)
                    7fffb7044548 _IO_puts+0x218 (/usr/lib64/libc-2.26.so)
                        10000440 main+0x20 (/home/sandipan/test)
                    7fffb6fe36a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so)
                    7fffb6fe3898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
                               0 [unknown] ([unknown])
        ...
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20180517063326.6319-2-sandipan@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7903a708
  12. 27 4月, 2018 4 次提交
  13. 17 4月, 2018 1 次提交
    • A
      perf script: Extend misc field decoding with switch out event type · bf30cc18
      Alexey Budankov 提交于
      Append 'p' sign to 'S' tag designating the type of context switch out event so
      'Sp' means preemption context switch. Documentation is extended to cover
      new presentation changes.
      
        $ perf script --show-switch-events -F +misc -I -i perf.data:
      
                hdparm 4073 [004] U  762.198265:     380194 cycles:ppp:      7faf727f5a23 strchr (/usr/lib64/ld-2.26.so)
                hdparm 4073 [004] K  762.198366:     441572 cycles:ppp:  ffffffffb9218435 alloc_set_pte (/lib/modules/4.16.0-rc6+/build/vmlinux)
                hdparm 4073 [004] S  762.198391: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0
               swapper    0 [004]    762.198392: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 4073/4073
               swapper    0 [004] Sp 762.198477: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 4073/4073
                hdparm 4073 [004]    762.198478: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
               swapper    0 [007] K  762.198514:    2303073 cycles:ppp:  ffffffffb98b0c66 intel_idle (/lib/modules/4.16.0-rc6+/build/vmlinux)
               swapper    0 [007] Sp 762.198561: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 1134/1134
        kworker/u16:18 1134 [007]    762.198562: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
        kworker/u16:18 1134 [007] S  762.198567: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/5fc65ce7-8ca5-53ae-8858-8ddd27290575@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bf30cc18
  14. 12 4月, 2018 1 次提交
  15. 19 3月, 2018 1 次提交
    • J
      perf tools: Fix snprint warnings for gcc 8 · 77f18153
      Jiri Olsa 提交于
      With gcc 8 we get new set of snprintf() warnings that breaks the
      compilation, one example:
      
        tests/mem.c: In function ‘check’:
        tests/mem.c:19:48: error: ‘%s’ directive output may be truncated writing \
              up to 99 bytes into a region of size 89 [-Werror=format-truncation=]
          snprintf(failure, sizeof failure, "unexpected %s", out);
      
      The gcc docs says:
      
       To avoid the warning either use a bigger buffer or handle the
       function's return value which indicates whether or not its output
       has been truncated.
      
      Given that all these warnings are harmless, because the code either
      properly fails due to uncomplete file path or we don't care for
      truncated output at all, I'm changing all those snprintf() calls to
      scnprintf(), which actually 'checks' for the snprint return value so the
      gcc stays silent.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Link: http://lkml.kernel.org/r/20180319082902.4518-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      77f18153
  16. 16 2月, 2018 1 次提交
  17. 25 1月, 2018 1 次提交
  18. 17 1月, 2018 3 次提交
    • J
      perf script: Remove the time slices number limitation · cc2ef584
      Jin Yao 提交于
      Previously it was only allowed to use at most 10 time slices in 'perf
      script --time'.
      
      This patch removes this limitation.
      For example, following command line is OK (12 time slices)
      
      perf script --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1515596433-24653-9-git-send-email-yao.jin@linux.intel.com
      [ No need to check for NULL to call free, use zfree ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cc2ef584
    • J
      perf script: Improve error msg when no first/last sample time found · 1e2778e9
      Jin Yao 提交于
      The following message will be returned to user when executing 'perf
      script --time' if perf data file doesn't contain the first/last sample
      time.
      
      "HINT: no first/last sample time found in perf data.
       Please use latest perf binary to execute 'perf record'
       (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')."
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1515596433-24653-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1e2778e9
    • A
      perf unwind: Do not look just at the global callchain_param.record_mode · eabad8c6
      Arnaldo Carvalho de Melo 提交于
      When setting up DWARF callchains on specific events, without using
      'record' or 'trace' --call-graph, but instead doing it like:
      
      	perf trace -e cycles/call-graph=dwarf/
      
      The unwind__prepare_access() call in thread__insert_map() when we
      process PERF_RECORD_MMAP(2) metadata events were not being performed,
      precluding us from using per-event DWARF callchains, handling them just
      when we asked for all events to be DWARF, using "--call-graph dwarf".
      
      We do it in the PERF_RECORD_MMAP because we have to look at one of the
      executable maps to figure out the executable type (64-bit, 32-bit) of
      the DSO laid out in that mmap. Also to look at the architecture where
      the perf.data file was recorded.
      
      All this probably should be deferred to when we process a sample for
      some thread that has callchains, so that we do this processing only for
      the threads with samples, not for all of them.
      
      For now, fix using DWARF on specific events.
      
      Before:
      
        # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.048 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.048/0.048/0.048/0.000 ms
           0.000 probe_libc:inet_pton:(7fe9597bb350))
        Problem processing probe_libc:inet_pton callchain, skipping...
        #
      
      After:
      
        # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.060 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
             0.000 probe_libc:inet_pton:(7fd4aa930350))
                                               __inet_pton (inlined)
                                               gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
                                               __GI_getaddrinfo (inlined)
                                               [0xffffaa804e51af3f] (/usr/bin/ping)
                                               __libc_start_main (/usr/lib64/libc-2.26.so)
                                               [0xffffaa804e51b379] (/usr/bin/ping)
        #
        # perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.057 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.057/0.057/0.057/0.000 ms
             0.000 probe_libc:inet_pton:(7f9363b9e350))
                                               __inet_pton (inlined)
                                               gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
                                               __GI_getaddrinfo (inlined)
                                               [0xffffa9e8a14e0f3f] (/usr/bin/ping)
                                               __libc_start_main (/usr/lib64/libc-2.26.so)
                                               [0xffffa9e8a14e1379] (/usr/bin/ping)
        #
        # perf trace --call-graph=fp --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.077 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.077/0.077/0.077/0.000 ms
             0.000 probe_libc:inet_pton:(7f4947e1c350))
                                               __inet_pton (inlined)
                                               gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
                                               __GI_getaddrinfo (inlined)
                                               [0xffffaa716d88ef3f] (/usr/bin/ping)
                                               __libc_start_main (/usr/lib64/libc-2.26.so)
                                               [0xffffaa716d88f379] (/usr/bin/ping)
        #
        # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=fp/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.078 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
             0.000 probe_libc:inet_pton:(7fa157696350))
                                               __GI___inet_pton (/usr/lib64/libc-2.26.so)
                                               getaddrinfo (/usr/lib64/libc-2.26.so)
                                               [0xffffa9ba39c74f40] (/usr/bin/ping)
        #
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/r/20180116182650.GE16107@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eabad8c6
  19. 10 1月, 2018 1 次提交
  20. 08 1月, 2018 2 次提交
  21. 27 12月, 2017 3 次提交
  22. 30 11月, 2017 1 次提交
    • A
      perf script: Allow computing 'perf stat' style metrics · 4bd1bef8
      Andi Kleen 提交于
      Add support for computing 'perf stat' style metrics in 'perf script'.
      
      When using leader sampling we can get metrics for each sampling period
      by computing formulas over the values of the different group members.
      
      This allows things like fine grained IPC tracking through sampling, much
      more fine grained than with 'perf stat'.
      
      The metric is still averaged over the sampling period, it is not just
      for the sampling point.
      
      This patch adds a new metric output field for 'perf script' that uses
      the existing 'perf stat' metrics infrastructure to compute any metrics
      supported by 'perf stat'.
      
      For example to sample IPC:
      
        $ perf record -e '{ref-cycles,cycles,instructions}:S' -a sleep 1
        $ perf script -F metric,ip,sym,time,cpu,comm
        ...
         alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
         alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
         alsa-sink-ALC32 [000] 42815.856074:      7fd65937d6cc [unknown]
         alsa-sink-ALC32 [000] 42815.856074:    metric:    0.13  insn per cycle
                 swapper [000] 42815.857961:  ffffffff81655df0 __schedule
                 swapper [000] 42815.857961:  ffffffff81655df0 __schedule
                 swapper [000] 42815.857961:  ffffffff81655df0 __schedule
                 swapper [000] 42815.857961:    metric:    0.23  insn per cycle
         qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
         qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
         qemu-system-x86 [000] 42815.858130:  ffffffff8165ad0e _raw_spin_unlock_irqrestore
         qemu-system-x86 [000] 42815.858130:    metric:    0.46  insn per cycle
                   :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                   :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                   :4972 [000] 42815.858312:  ffffffffa080e5f2 vmx_vcpu_run
                   :4972 [000] 42815.858312:    metric:    0.45  insn per cycle
      
      TopDown:
      
      This requires disabling SMT if you have it enabled, because SMT would
      require sampling per core, which is not supported.
      
        $ perf record -e '{ref-cycles,topdown-fetch-bubbles,\
                           topdown-recovery-bubbles,\
                           topdown-slots-retired,topdown-total-slots,\
                           topdown-slots-issued}:S' -a sleep 1
        $ perf script --header -I -F cpu,ip,sym,event,metric,period
        ...
        [000]     121108               ref-cycles:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     190350    topdown-fetch-bubbles:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]       2055 topdown-recovery-bubbles:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     148729    topdown-slots-retired:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     144324      topdown-total-slots:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]     160852     topdown-slots-issued:  ffffffff8165222e copy_user_enhanced_fast_string
        [000]   metric:     33.0% frontend bound
        [000]   metric:      3.5% bad speculation
        [000]   metric:     25.8% retiring
        [000]   metric:     37.7% backend bound
        [000]     112112               ref-cycles:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     357222    topdown-fetch-bubbles:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]       3325 topdown-recovery-bubbles:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     323553    topdown-slots-retired:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     270507      topdown-total-slots:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]     341226     topdown-slots-issued:  ffffffff8165aec8 _raw_spin_lock_irqsave
        [000]   metric:     33.0% frontend bound
        [000]   metric:      2.9% bad speculation
        [000]   metric:     29.9% retiring
        [000]   metric:     34.2% backend bound
      ...
      
      v2:
      Use evsel->priv for new fields
      Port to new base line, support fp output.
      Handle stats in ->stats, not ->priv
      Minor cleanups
      
      Extra explanation about the use of the term 'averaging', from Andi in the
      thread in the Link: tag below:
      
      <quote Andi>
      The current samples contains the sum of event counts for a sampling period.
      
      EventA-1           EventA-2                EventA-3      EventA-4
      EventB-1     EventB-2                             EventC-3
      
                               gap with no events                overflow
      |-----------------------------------------------------------------|
      period-start                                             period-end
      ^                                                                 ^
      |                                                                 |
      previous sample                                      current sample
      
      So EventA = 4 and EventB = 3 at the sample point
      
      I generate a metric, let's say EventA / EventB. It applies to the whole period.
      
      But the metric is over a longer time which does not have the same behavior. For
      example the gap above doesn't have any events, while they are clustered at the
      beginning and end of the sample period.
      
      But we're summing everything together. The metric doesn't know that the gap is
      different than the busy period.
      
      That's what I'm trying to express with averaging.
      </quote>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20171117214300.32746-4-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4bd1bef8