1. 26 3月, 2021 1 次提交
    • A
      perf tools: Support pipeline stage cycles for powerpc · 06e5ca74
      Athira Rajeev 提交于
      The pipeline stage cycles details can be recorded on powerpc from the
      contents of Performance Monitor Unit (PMU) registers. On ISA v3.1
      platform, sampling registers exposes the cycles spent in different
      pipeline stages. Patch adds perf tools support to present two of the
      cycle counter information along with memory latency (weight).
      
      Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
      This is stored in 'var2_w' field of 'perf_sample_weight'.
      
      Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
      which is stored in 'var3_w' field of perf_sample_weight.
      
      Add new sort function 'Pipeline Stage Cycle' and include this in
      default_mem_sort_order[]. This new sort function may be used to denote
      some other pipeline stage in another architecture. So add this to list
      of sort entries that can have dynamic header string.
      Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Reviewed-by: NMadhavan Srinivasan <maddy@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      06e5ca74
  2. 09 2月, 2021 2 次提交
    • K
      perf report: Support instruction latency · 590db42d
      Kan Liang 提交于
      The instruction latency information can be recorded on some platforms,
      e.g., the Intel Sapphire Rapids server. With both memory latency
      (weight) and the new instruction latency information, users can easily
      locate the expensive load instructions, and also understand the time
      spent in different stages. The users can optimize their applications in
      different pipeline stages.
      
      The 'weight' field is shared among different architectures. Reusing the
      'weight' field may impacts other architectures. Add a new field to store
      the instruction latency.
      
      Like the 'weight' support, introduce a 'ins_lat' for the global
      instruction latency, and a 'local_ins_lat' for the local instruction
      latency version.
      
      Add new sort functions, INSTR Latency and Local INSTR Latency,
      accordingly.
      
      Add local_ins_lat to the default_mem_sort_order[].
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      590db42d
    • K
      perf tools: Support data block and addr block · a054c298
      Kan Liang 提交于
      Two new data source fields, to indicate the block reasons of a load
      instruction, are introduced on the Intel Sapphire Rapids server. The
      fields can be used by the memory profiling.
      
      Add a new sort function, SORT_MEM_BLOCKED, for the two fields.
      
      For the previous platforms or the block reason is unknown, print "N/A"
      for the block reason.
      
      Add blocked as a default mem sort key for perf report and perf mem
      report.
      
      Committer testing:
      
      So in machines without this capability we get a "N/A" filling the new "Blocked"
      column:
      
        $ perf mem record ls
        arch     certs	 CREDITS  Documentation  include  ipc     Kconfig  lib       MAINTAINERS  mm   samples  security  usr    block
        COPYING	 crypto	 drivers  fs             init     Kbuild  kernel   LICENSES  Makefile     net  README   scripts   sound  tools
        virt
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
        $
        $ perf mem report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 6  of event 'cpu/mem-loads,ldlat=30/Pu'
        # Total weight : 1381
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
        #
        # Overhead  Samples  Local Weight  Memory access         Symbol                   Shared Object  Data Symbol             Data Object   Snoop  TLB access    Locked  Blocked
        # ........  .......  ............  ....................  .......................  .............  ......................  ............  .....  ............  ......  .......
        #
            32.87%        1  454           Local RAM or RAM hit  [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91cef3078  libc-2.31.so  Hit    L1 or L2 hit  No       N/A
            25.56%        1  353           LFB or LFB hit        [.] strcmp               ld-2.31.so     [.] 0x00005586973855ca  ls            None   L1 or L2 hit  No       N/A
            22.59%        1  312           LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0e3b18  ld.so.cache   None   L1 or L2 hit  No       N/A
             8.47%        1  117           LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceee570  libc-2.31.so  None   L1 or L2 hit  No       N/A
             6.88%        1  95            LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceed490  libc-2.31.so  None   L1 or L2 hit  No       N/A
             3.62%        1  50            LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0ebe60  ld.so.cache   None   L1 or L2 hit  No       N/A
      
        # Samples: 11  of event 'cpu/mem-stores/Pu'
        # Total weight : 11
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
        #
        # Overhead  Samples  Local Weight  Memory access  Symbol                   Shared Object  Data Symbol             Data Object  Snoop  TLB access  Locked  Blocked
        # ........  .......  ............  .............  .......................  .............  ......................  ...........  .....  ..........  ......  .......
        #
             9.09%        1  0             L1 hit         [.] __strcoll_l          libc-2.31.so   [.] 0x00007fffe5648fc8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_lookup_symbol_x  ld-2.31.so     [.] 0x00007fffe56490b8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_name_match_p     ld-2.31.so     [.] 0x00007fffe56487d8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_start            ld-2.31.so     [.] start_time+0x0      ld-2.31.so   N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] _dl_sysdep_start     ld-2.31.so     [.] 0x00007fffe56494b8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5648ff8  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649064  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649130  [stack]      N/A    N/A         N/A      N/A
             9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xaf8  ld-2.31.so   N/A    N/A         N/A      N/A
             9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xc28  ld-2.31.so   N/A    N/A         N/A      N/A
             9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] 0x00007fffe56495b8  [stack]      N/A    N/A         N/A      N/A
      
        # (Tip: Show user configuration overrides: perf config --user --list)
        $
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a054c298
  3. 21 1月, 2021 1 次提交
  4. 20 12月, 2020 1 次提交
    • K
      perf sort: Add sort option for data page size · a50d03e3
      Kan Liang 提交于
      Add a new sort option "data_page_size" for --mem-mode sort.  With this
      option applied, perf can sort and report by sample's data page size.
      
      Here is an example:
      
      perf report --stdio --mem-mode
      --sort=comm,symbol,phys_daddr,data_page_size
      
       # To display the perf.data header info, please use
       # --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 9K of event 'mem-loads:uP'
       # Total weight : 9028
       # Sort order   : comm,symbol,phys_daddr,data_page_size
       #
       # Overhead  Command  Symbol                        Data Physical
       # Address
       # Data Page Size
       # ........  .......  ............................
       # ......................  ......................
       #
          11.19%  dtlb     [.] touch_buffer              [.] 0x00000003fec82ea8  4K
           8.61%  dtlb     [.] GetTickCount              [.] 0x00000003c4f2c8a8  4K
           4.52%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f58  4K
           4.33%  dtlb     [.] __gettimeofday            [.] 0x00000003fec82f48  4K
           4.32%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f78  4K
           4.28%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f50  4K
           4.23%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f70  4K
           4.11%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f68  4K
           4.00%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f98  4K
           3.91%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f90  4K
           3.43%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e98  4K
           3.42%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e90  4K
           0.09%  dtlb     [.] DoDependentLoads          [.] 0x000000036ea084c0  2M
           0.08%  dtlb     [.] DoDependentLoads          [.] 0x000000032b010b80  2M
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a50d03e3
  5. 01 12月, 2020 1 次提交
  6. 28 5月, 2020 2 次提交
  7. 06 5月, 2020 3 次提交
  8. 18 4月, 2020 1 次提交
    • K
      perf hist: Add fast path for duplicate entries check · 12e89e65
      Kan Liang 提交于
      Perf checks the duplicate entries in a callchain before adding an entry.
      However the check is very slow especially with deeper call stack.
      Almost ~50% elapsed time of perf report is spent on the check when the
      call stack is always depth of 32.
      
      The hist_entry__cmp() is used to compare the new entry with the old
      entries. It will go through all the available sorts in the sort_list,
      and call the specific cmp of each sort, which is very slow.
      
      Actually, for most cases, there are no duplicate entries in callchain.
      The symbols are usually different. It's much faster to do a quick check
      for symbols first. Only do the full cmp when the symbols are exactly the
      same.
      
      The quick check is only to check symbols, not dso. Export
      _sort__sym_cmp.
      
        $ perf record --call-graph lbr ./tchain_edit_64
      
        Without the patch
        $time perf report --stdio
        real    0m21.142s
        user    0m21.110s
        sys     0m0.033s
      
        With the patch
        $time perf report --stdio
        real    0m10.977s
        user    0m10.948s
        sys     0m0.027s
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      12e89e65
  9. 03 4月, 2020 1 次提交
    • N
      perf report: Add 'cgroup' sort key · b629f3e9
      Namhyung Kim 提交于
      The cgroup sort key is to show cgroup membership of each task.
      Currently it shows full path in the cgroupfs (not relative to the root
      of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise root
      cgroup in different namespaces will all show same name - "/".
      
      The cgroup sort key should come before cgroup_id otherwise
      sort_dimension__add() will match it to cgroup_id as it only matches with
      the given substring.
      
      For example it will look like following.  Note that record patch adding
      --all-cgroups patch will come later.
      
        $ perf record -a --namespace --all-cgroups  cgtest
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]
      
        $ perf report -s cgroup_id,cgroup,pid
        ...
        # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
        # ........  .....................  ..........  ...............
        #
            93.96%  0/0x0                  /                 0:swapper
             1.25%  3/0xeffffffb           /               278:looper0
             0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
             0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
             0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
             0.22%  3/0xeffffffb           /sub            278:looper0
             0.20%  3/0xeffffffb           /               280:cgtest
             0.15%  3/0xf0000163           /sub/cgrp3      285:looper3
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b629f3e9
  10. 10 3月, 2020 1 次提交
    • K
      perf tools: Add hw_idx in struct branch_stack · 42bbabed
      Kan Liang 提交于
      The low level index of raw branch records for the most recent branch can
      be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
      branch_sample_type. Extend struct branch_stack to support it.
      
      However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
      entries[] will be output by kernel. The pointer of entries[] could be
      wrong, since the output format is different with new struct
      branch_stack.  Add a variable no_hw_idx in struct perf_sample to
      indicate whether the hw_idx is output.  Add get_branch_entry() to return
      corresponding pointer of entries[0].
      
      To make dummy branch sample consistent as new branch sample, add hw_idx
      in struct dummy_branch_stack for cs-etm and intel-pt.
      
      Apply the new struct branch_stack for synthetic events as well.
      
      Extend test case sample-parsing to support new struct branch_stack.
      
      Committer notes:
      
      Renamed get_branch_entries() to perf_sample__branch_entries() to have
      proper namespacing and pave the way for this to be moved to libperf,
      eventually.
      
      Add 'static' to that inline as it is in a header.
      
      Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
      on arm64.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42bbabed
  11. 26 11月, 2019 2 次提交
  12. 12 11月, 2019 2 次提交
  13. 07 11月, 2019 4 次提交
  14. 05 11月, 2019 1 次提交
  15. 01 9月, 2019 4 次提交
  16. 30 8月, 2019 1 次提交
  17. 29 8月, 2019 1 次提交
  18. 26 8月, 2019 1 次提交
    • A
      perf report: Fix --ns time sort key output · 3dab6ac0
      Andi Kleen 提交于
      If the user specified --ns, the column to print the sort time stamp
      wasn't wide enough to actually print the full nanoseconds.
      
      Widen the time key column width when --ns is specified.
      
      Before:
      
        % perf record -a sleep 1
        % perf report --sort time,overhead,symbol --stdio --ns
        ...
             2.39%  187851.10000  [k] smp_call_function_single   -      -
             1.53%  187851.10000  [k] intel_idle                 -      -
             0.59%  187851.10000  [.] __wcscmp_ifunc             -      -
             0.33%  187851.10000  [.] 0000000000000000           -      -
             0.28%  187851.10000  [k] cpuidle_enter_state        -      -
      
      After:
      
        % perf report --sort time,overhead,symbol --stdio --ns
        ...
             2.39%  187851.100000000  [k] smp_call_function_single   -      -
             1.53%  187851.100000000  [k] intel_idle                 -      -
             0.59%  187851.100000000  [.] __wcscmp_ifunc             -      -
             0.33%  187851.100000000  [.] 0000000000000000           -      -
             0.28%  187851.100000000  [k] cpuidle_enter_state        -      -
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190823210338.12360-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3dab6ac0
  19. 13 8月, 2019 2 次提交
    • A
      perf hist: Remove dummy entries when finding real ones. · 5f8b4d5d
      Arnaldo Carvalho de Melo 提交于
      When he have an event group we have multiple struct hist instances, one
      per evsel, and in each of these hists we may have hist_entries that
      point to the same thing being observed, say a symbol, i.e. if we're
      looking at instructions and cycles, then we'll have one hist_entry in
      the "instructions" evsel and another in the "cycles" evsel.
      
      We need to link those to then show one column for each. When we're
      looking at some other pair of events, say instructions and cache misses,
      we may have just the "instructions" hist entry and not one for "cache
      misses", as instructions not necessarily generate cache misses, as the
      logic expects one hist_entry per evsel, we end up adding "dummy"
      hist_entries.
      
      This is enough for 'perf report', that does this matching operation
      (hists__match()) just once after processing all events, but for 'perf
      top', we do this at each refresh, so we may finally find events matching
      and then we need to trow away the dummies and link with the real events.
      
      So if we find a match, traverse the link of matches and trow away
      dummies for that hists.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-dwvtjqqifsbsczeb35q6mqkk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5f8b4d5d
    • A
      perf hists: Do not link a pair if already linked · 7d1a5efa
      Arnaldo Carvalho de Melo 提交于
      When we have multiple events in a group we link hist_entries in the
      non-leader evsel hists to the one in the leader that points to the same
      sorting criteria, in hists__match().
      
      For 'perf report' we do this just once and then print the results, but
      for 'perf top' we need to look if this was already done in the previous
      refresh of the screen, so check for that and don't try to link again.
      
      This is part of having 'perf top' using the hists browser for showing
      multiple events in multiple columns.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-iwvb37rgb7upswhruwpcdnhw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7d1a5efa
  20. 30 7月, 2019 4 次提交
  21. 09 7月, 2019 3 次提交
  22. 03 7月, 2019 1 次提交
    • J
      perf diff: Print the basic block cycles diff · b10c78c5
      Jin Yao 提交于
       $ perf record -b ./div
       $ perf record -b ./div
      
      Following is the default perf diff output
      
       $ perf diff
      
       # Event 'cycles'
       #
       # Baseline  Delta Abs  Shared Object     Symbol
       # ........  .........  ................  ..................................
       #
           48.75%     +0.33%  div               [.] main
            8.21%     -0.20%  div               [.] compute_flag
           19.02%     -0.12%  libc-2.23.so      [.] __random_r
           16.17%     -0.09%  libc-2.23.so      [.] __random
            2.27%     -0.03%  div               [.] rand@plt
                      +0.02%  [i915]            [k] gen8_irq_handler
            5.52%     +0.02%  libc-2.23.so      [.] rand
      
      This patch creates a new computation selection 'cycles'.
      
       $ perf diff -c cycles
      
       # Event 'cycles'
       #
       # Baseline       [Program Block Range] Cycles Diff Shared Object Symbol
       # ........ ....................................... .........................................
       #
           48.75%             [div.c:42 -> div.c:45]  147 div           [.] main
           48.75%             [div.c:31 -> div.c:40]    4 div           [.] main
           48.75%             [div.c:40 -> div.c:40]    0 div           [.] main
           48.75%             [div.c:42 -> div.c:42]    0 div           [.] main
           48.75%             [div.c:42 -> div.c:44]    0 div           [.] main
           19.02% [random_r.c:357 -> random_r.c:360]    0 libc-2.23.so  [.] __random_r
           19.02% [random_r.c:357 -> random_r.c:373]    0 libc-2.23.so  [.] __random_r
           19.02% [random_r.c:357 -> random_r.c:376]    0 libc-2.23.so  [.] __random_r
           19.02% [random_r.c:357 -> random_r.c:380]    0 libc-2.23.so  [.] __random_r
           19.02% [random_r.c:357 -> random_r.c:392]    0 libc-2.23.so  [.] __random_r
           16.17%     [random.c:288 -> random.c:291]    0 libc-2.23.so  [.] __random
           16.17%     [random.c:288 -> random.c:291]    0 libc-2.23.so  [.] __random
           16.17%     [random.c:288 -> random.c:295]    0 libc-2.23.so  [.] __random
           16.17%     [random.c:288 -> random.c:297]    0 libc-2.23.so  [.] __random
           16.17%     [random.c:291 -> random.c:291]    0 libc-2.23.so  [.] __random
           16.17%     [random.c:293 -> random.c:293]    0 libc-2.23.so  [.] __random
            8.21%             [div.c:22 -> div.c:22]  148 div           [.] compute_flag
            8.21%             [div.c:22 -> div.c:25]    0 div           [.] compute_flag
            8.21%             [div.c:27 -> div.c:28]    0 div           [.] compute_flag
            5.52%           [rand.c:26 -> rand.c:27]    0 libc-2.23.so  [.] rand
            5.52%           [rand.c:26 -> rand.c:28]    0 libc-2.23.so  [.] rand
            2.27%         [rand@plt+0 -> rand@plt+0]    0 div           [.] rand@plt
            0.01% [entry_64.S:694 -> entry_64.S:694]   16 [vmlinux]     [k] native_irq_return_iret
            0.00%       [fair.c:7676 -> fair.c:7665]  162 [vmlinux]     [k] update_blocked_averages
      
      "[Program Block Range]" indicates the range of program basic block
      (start -> end). If we can find the source line it prints the source line
      otherwise it prints the symbol+offset instead.
      
       v4:
       ---
       Use source lines or symbol+offset to indicate the basic block. It should
       be easier to understand.
      
       v3:
       ---
       Cast 'struct hist_entry' to 'struct block_hist' in hist_entry__block_fprintf.
       Use symbol_conf.report_block to check if executing hist_entry__block_fprintf.
      
       v2:
       ---
       Keep standard perf diff format and display the 'Baseline' and
       'Shared Object'.
      
      The output is sorted by "Baseline" and the basic blocks in the same
      function are sorted by cycles diff.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1561713784-30533-7-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b10c78c5