1. 20 3月, 2019 1 次提交
  2. 12 3月, 2019 1 次提交
    • A
      perf report: Implement browsing of individual samples · 4968ac8f
      Andi Kleen 提交于
      Now 'perf report' can show whole time periods with 'perf script', but
      the user still has to find individual samples of interest manually.
      
      It would be expensive and complicated to search for the right samples in
      the whole perf file. Typically users only need to look at a small number
      of samples for useful analysis.
      
      Also the full scripts tend to show samples of all CPUs and all threads
      mixed up, which can be very confusing on larger systems.
      
      Add a new --samples option to save a small random number of samples per
      hist entry.
      
      Use a reservoir sample technique to select a representatve number of
      samples.
      
      Then allow browsing the samples using 'perf script' as part of the hist
      entry context menu. This automatically adds the right filters, so only
      the thread or cpu of the sample is displayed. Then we use less' search
      functionality to directly jump the to the time stamp of the selected
      sample.
      
      It uses different menus for assembler and source display.  Assembler
      needs xed installed and source needs debuginfo.
      
      Currently it only supports as many samples as fit on the screen due to
      some limitations in the slang ui code.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190311174605.GA29294@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4968ac8f
  3. 11 3月, 2019 2 次提交
  4. 01 3月, 2019 1 次提交
    • J
      perf time-utils: Refactor time range parsing code · 284c4e18
      Jin Yao 提交于
      Jiri points out that we don't need any time checking and time string
      parsing if the --time option is not set. That makes sense.
      
      This patch refactors the time range parsing code, move the duplicated
      code from perf report and perf script to time_utils and check if --time
      option is set before parsing the time string. This patch is no logic
      change expected. So the usage of --time is same as before.
      
      For example:
      
      Select the first and second 10% time slices:
        perf report --time 10%/1,10%/2
        perf script --time 10%/1,10%/2
      
      Select the slices from 0% to 10% and from 30% to 40%:
        perf report --time 0%-10%,30%-40%
        perf script --time 0%-10%,30%-40%
      
      Select the time slices from timestamp 3971 to 3973
        perf report --time 3971,3973
        perf script --time 3971,3973
      
      Committer testing:
      
      Using the above examples, check before and after to see if it remains
      the same:
      
        $ perf record -F 10000 -- find . -name "*.[ch]" -exec cat {} + > /dev/null
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 1.626 MB perf.data (42392 samples) ]
        $
        $ perf report --time 10%/1,10%/2 > /tmp/report.before.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.before.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.before.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.before.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.before.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.before.3
      
      For example, the 3rd test produces this slice:
      
        $ cat /tmp/script.before.3
              cat  3147 180457.375844:   2143 cycles:uppp:      7f79362590d9 cfree@GLIBC_2.2.5+0x9 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.375986:   2245 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376012:   2164 cycles:uppp:      7f7936257430 _int_malloc+0x8c0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376140:   2921 cycles:uppp:      558b70f3a554 [unknown] (/usr/bin/cat)
              cat  3147 180457.376296:   2844 cycles:uppp:      7f7936258abe malloc+0x4e (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376431:   2717 cycles:uppp:      558b70f3b0ca [unknown] (/usr/bin/cat)
              cat  3147 180457.376667:   2630 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376795:   2442 cycles:uppp:      7f79362bff55 read+0x15 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376927:   2376 cycles:uppp:  ffffffff9aa00163 [unknown] ([unknown])
              cat  3147 180457.376954:   2307 cycles:uppp:      7f7936257438 _int_malloc+0x8c8 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377116:   3091 cycles:uppp:      7f7936258a70 malloc+0x0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377362:   2945 cycles:uppp:      558b70f3a3b0 [unknown] (/usr/bin/cat)
              cat  3147 180457.377517:   2727 cycles:uppp:      558b70f3a9aa [unknown] (/usr/bin/cat)
        $
      
      Install 'coreutils-debuginfo' to see cat's guts (symbols), but then, the
      above chunk translates into this 'perf report' output:
      
        $ cat /tmp/report.before.3
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles:uppp' (time slices: 180457.375844,180457.377717)
        # Event count (approx.): 33552
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ......................
        #
            17.69%  cat      libc-2.28.so      [.] malloc
            14.53%  cat      cat               [.] 0x000000000000586e
            13.33%  cat      libc-2.28.so      [.] _int_malloc
             8.78%  cat      cat               [.] 0x00000000000023b0
             8.71%  cat      cat               [.] 0x0000000000002554
             8.13%  cat      cat               [.] 0x00000000000029aa
             8.10%  cat      cat               [.] 0x00000000000030ca
             7.28%  cat      libc-2.28.so      [.] read
             7.08%  cat      [unknown]         [k] 0xffffffff9aa00163
             6.39%  cat      libc-2.28.so      [.] cfree@GLIBC_2.2.5
      
        #
        # (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        #
        $
      
      Now lets see after applying this patch, nothing should change:
      
        $ perf report --time 10%/1,10%/2 > /tmp/report.after.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.after.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.after.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.after.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.after.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.after.3
        $ diff -u /tmp/report.before.1 /tmp/report.after.1
        $ diff -u /tmp/script.before.1 /tmp/script.after.1
        $ diff -u /tmp/report.before.2 /tmp/report.after.2
        --- /tmp/report.before.2	2019-03-01 11:01:53.526094883 -0300
        +++ /tmp/report.after.2	2019-03-01 11:09:18.231770467 -0300
        @@ -352,5 +352,5 @@
      
         #
        -# (Tip: Generate a script for your data: perf script -g <lang>)
        +# (Tip: Treat branches as callchains: perf report --branch-history)
         #
        $ diff -u /tmp/script.before.2 /tmp/script.after.2
        $ diff -u /tmp/report.before.3 /tmp/report.after.3
        --- /tmp/report.before.3	2019-03-01 11:03:08.890045588 -0300
        +++ /tmp/report.after.3	2019-03-01 11:09:40.660224002 -0300
        @@ -22,5 +22,5 @@
      
         #
        -# (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        +# (Tip: List events using substring match: perf list <keyword>)
         #
        $ diff -u /tmp/script.before.3 /tmp/script.after.3
        $
      
      Cool, just the 'perf report' tips changed, QED.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1551435186-6008-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      284c4e18
  5. 23 2月, 2019 1 次提交
    • J
      perf data: Add global path holder · 2d4f2799
      Jiri Olsa 提交于
      Add a 'path' member to 'struct perf_data'. It will keep the configured
      path for the data (const char *). The path in struct perf_data_file is
      now dynamically allocated (duped) from it.
      
      This scheme is useful/used in following patches where struct
      perf_data::path holds the 'configure' directory path and struct
      perf_data_file::path holds the allocated path for specific files.
      
      Also it actually makes the code little simpler.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org
      [ Fixup data-convert-bt.c missing conversion ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d4f2799
  6. 06 2月, 2019 2 次提交
  7. 25 1月, 2019 1 次提交
  8. 22 1月, 2019 1 次提交
    • R
      perf tools: Replace automatic const char[] variables by statics · 49b8e2be
      Rasmus Villemoes 提交于
      An automatic const char[] variable gets initialized at runtime, just
      like any other automatic variable. For long strings, that uses a lot of
      stack and wastes time building the string; e.g. for the "No %s
      allocation events..." case one has:
      
        444516:       48 b8 4e 6f 20 25 73 20 61 6c   movabs $0x6c61207325206f4e,%rax # "No %s al"
        ...
        444674:       48 89 45 80                     mov    %rax,-0x80(%rbp)
        444678:       48 b8 6c 6f 63 61 74 69 6f 6e   movabs $0x6e6f697461636f6c,%rax # "location"
        444682:       48 89 45 88                     mov    %rax,-0x78(%rbp)
        444686:       48 b8 20 65 76 65 6e 74 73 20   movabs $0x2073746e65766520,%rax # " events "
        444690:       66 44 89 55 c4                  mov    %r10w,-0x3c(%rbp)
        444695:       48 89 45 90                     mov    %rax,-0x70(%rbp)
        444699:       48 b8 66 6f 75 6e 64 2e 20 20   movabs $0x20202e646e756f66,%rax
      
      Make them all static so that the compiler just references objects in .rodata.
      
      Committer testing:
      
      Ok, using dwarves's codiff tool:
      
          $ codiff --functions /tmp/perf.before ~/bin/perf
        builtin-sched.c:
          cmd_sched                 |  -48
         1 function changed, 48 bytes removed, diff: -48
      
        builtin-report.c:
          cmd_report                |  -32
         1 function changed, 32 bytes removed, diff: -32
      
        builtin-kmem.c:
          cmd_kmem                  |  -64
          build_alloc_func_list     |  -50
         2 functions changed, 114 bytes removed, diff: -114
      
        builtin-c2c.c:
          perf_c2c__report          | -390
         1 function changed, 390 bytes removed, diff: -390
      
        ui/browsers/header.c:
          tui__header_window        | -104
         1 function changed, 104 bytes removed, diff: -104
      
        /home/acme/bin/perf:
         9 functions changed, 688 bytes removed, diff: -688
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20181102230624.20064-1-linux@rasmusvillemoes.dkSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      49b8e2be
  9. 18 12月, 2018 1 次提交
    • J
      perf report: Display average IPC and IPC coverage per symbol · ec6ae74f
      Jin Yao 提交于
      Support displaying the average IPC and IPC coverage per symbol in 'perf
      report' --tui and --stdio modes.
      
      For example,
      
       $ perf record -b ...
       $ perf report -s symbol
      
       Overhead  Symbol                           IPC   [IPC Coverage]
         39.60%  [.] __random                     2.30  [ 54.8%]
         18.02%  [.] main                         0.43  [ 54.3%]
         14.21%  [.] compute_flag                 2.29  [100.0%]
         14.16%  [.] rand                         0.36  [100.0%]
          7.06%  [.] __random_r                   2.57  [ 70.5%]
          6.85%  [.] rand@plt                     0.00  [  0.0%]
      
      Jiri Olsa <jolsa@redhat.com> provided the patch to support the --stdio
      mode. I merged Jiri's code in this patch.
      
        $ perf report -s symbol --stdio
      
          # Overhead  Symbol                       IPC   [IPC Coverage]
          # ........  ...........................  ....................
          #
            39.60%  [.] __random                   2.30  [ 54.8%]
            18.02%  [.] main                       0.43  [ 54.3%]
            14.21%  [.] compute_flag               2.29  [100.0%]
            14.16%  [.] rand                       0.36  [100.0%]
             7.06%  [.] __random_r                 2.57  [ 70.5%]
             6.85%  [.] rand@plt                   0.00  [  0.0%]
             0.02%  [k] run_timer_softirq          1.60  [ 57.2%]
      
      The columns "IPC" and "[IPC Coverage]" are automatically enabled when
      the sort-key "symbol" is specified. If the perf.data file doesn't
      contain timed LBR information, columns are filled with "-".
      
      For example,
      
        # Overhead  Symbol                       IPC   [IPC Coverage]
        # ........  ...........................  ....................
        #
            46.57%  [.] main                     -      -
            17.60%  [.] rand                     -      -
            15.84%  [.] __random_r               -      -
            11.90%  [.] __random                 -      -
             6.50%  [.] compute_flag             -      -
             1.59%  [.] rand@plt                 -      -
             0.00%  [.] _dl_relocate_object      -      -
             0.00%  [k] tlb_flush_mmu            -      -
             0.00%  [k] perf_event_mmap          -      -
             0.00%  [k] native_sched_clock       -      -
             0.00%  [k] intel_pmu_handle_irq_v4  -      -
             0.00%  [k] native_write_msr         -      -
      
       v3:
       ---
       Removed the sortkey 'ipc' from command-line. The columns "IPC"
       and "[IPC Coverage]" are automatically enabled when "symbol"
       is specified.
      
       v2:
       ---
       Merge in Jiri's patch to support stdio mode
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1543586097-27632-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec6ae74f
  10. 16 10月, 2018 1 次提交
    • J
      perf evsel: Store ids for events with their own cpus perf_event__synthesize_event_update_cpus · 4ab8455f
      Jiri Olsa 提交于
      John reported crash when recording on an event under PMU with cpumask defined:
      
        root@localhost:~# ./perf_debug_ record -e armv8_pmuv3_0/br_mis_pred/ sleep 1
        perf: Segmentation fault
        Obtained 9 stack frames.
        ./perf_debug_() [0x4c5ef8]
        [0xffff82ba267c]
        ./perf_debug_() [0x4bc5a8]
        ./perf_debug_() [0x419550]
        ./perf_debug_() [0x41a928]
        ./perf_debug_() [0x472f58]
        ./perf_debug_() [0x473210]
        ./perf_debug_() [0x4070f4]
        /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8294c8a0]
        Segmentation fault (core dumped)
      
      We synthesize an update event that needs to touch the evsel id array, which is
      not defined at that time. Fixing this by forcing the id allocation for events
      with their own cpus.
      Reported-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@huawei.com
      Fixes: bfd8f72c ("perf record: Synthesize unit/scale/... in event update")
      Link: http://lkml.kernel.org/r/20181003212052.GA32371@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4ab8455f
  11. 20 9月, 2018 1 次提交
  12. 19 9月, 2018 1 次提交
  13. 14 8月, 2018 1 次提交
  14. 09 8月, 2018 1 次提交
  15. 25 7月, 2018 1 次提交
  16. 25 6月, 2018 1 次提交
    • R
      perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE] · 92ead7ee
      Ravi Bangoria 提交于
      perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE]
      which is not defined and thus perf is crashing. HEADER_LAST_FEATURE is
      used as an end marker for the perf report but it's unused for perf
      script/annotate. Ignore HEADER_LAST_FEATURE for perf script/annotate,
      just like it is done in 'perf report'.
      
      Before:
        # perf record -o - ls | perf script
        <SNIP 'ls' output>
        Segmentation fault (core dumped)
        #
      
      After:
        # perf record -o - ls | perf script
        <SNIP 'ls' output>
        Segmentation fault (core dumped)
        ls 7031 4392.099856:  250000 cpu-clock:uhH:  7f5e0ce7cd60
        ls 7031 4392.100355:  250000 cpu-clock:uhH:  7f5e0c706ef7
        #
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 57b5de46 ("perf report: Support forced leader feature in pipe mode")
      Link: http://lkml.kernel.org/r/20180625124220.6434-4-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92ead7ee
  17. 04 6月, 2018 6 次提交
  18. 22 5月, 2018 1 次提交
  19. 27 4月, 2018 3 次提交
  20. 21 3月, 2018 2 次提交
  21. 17 3月, 2018 1 次提交
    • J
      perf report: Support forced leader feature in pipe mode · 57b5de46
      Jiri Olsa 提交于
      Stephane reported a problem with forced leader in pipe mode, where
      report does not force the group output. The reason is that we don't
      force the leader in pipe mode.
      
      This patch adds HEADER_LAST_FEATURE mark to have a point where we have
      all events and features received, and force the group if requested.
      
        $ perf record --group -e '{cycles, instructions}' -o - kill | perf report -i - --group
      
        SNIP
      
        #         Overhead  Command  Shared Object     Symbol
        # ................  .......  ................  .......................
        #
            28.36%   0.00%  kill     libc-2.25.so      [.] __unregister_atfork
            26.32%   0.00%  kill     libc-2.25.so      [.] _dl_addr
            26.10%   0.00%  kill     ld-2.25.so        [.] _dl_relocate_object
            17.32%   0.00%  kill     ld-2.25.so        [.] __tunables_init
             1.70%   0.01%  kill     [unknown]         [k] 0xffffffffafa01a40
             0.20%   0.00%  kill     ld-2.25.so        [.] _start
             0.00%  48.77%  kill     ld-2.25.so        [.] do_lookup_x
             0.00%  42.97%  kill     libc-2.25.so      [.] _IO_getline
             0.00%   6.35%  kill     ld-2.25.so        [.] strcmp
             0.00%   1.71%  kill     ld-2.25.so        [.] _dl_sysdep_start
             0.00%   0.19%  kill     ld-2.25.so        [.] _dl_start
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180314092205.23291-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      57b5de46
  22. 08 3月, 2018 2 次提交
  23. 16 2月, 2018 2 次提交
    • J
      perf report: Add support to display group output for non group events · ad52b8cb
      Jiri Olsa 提交于
      Add support to display group output for if non grouped events are
      detected and user forces --group option. Now for non-group events
      recorded like:
      
        $ perf record -e 'cycles,instructions' ls
      
      you can still get group output by using --group option
      in report:
      
        $ perf report --group --stdio
        ...
        #         Overhead  Command  Shared Object     Symbol
        # ................  .......  ................  ......................
        #
            17.67%   0.00%  ls       libc-2.25.so      [.] _IO_do_write@@GLIB
            15.59%  25.94%  ls       ls                [.] calculate_columns
            15.41%  31.35%  ls       libc-2.25.so      [.] __strcoll_l
        ...
      
      Committer note:
      
      We should improve on this by making sure that the first line states that
      this is not a group, but since the user doesn't have to force group view
      when really using grouped events (e.g. '{cycles,instructions}'), the
      user better know what is being done...
      Requested-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180209092734.GB20449@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ad52b8cb
    • J
      perf report: Ask for ordered events for --tasks option · 8614ada0
      Jiri Olsa 提交于
      If we have the time in, keep the events in time order.
      
      Committer notes:
      
      Trying to be more verbose, what actual effect this will have in this particular
      case?
      
      Before and after this patch shows the artifacts:
      
        --- /tmp/before 2018-02-06 15:40:29.536411625 -0300
        +++ /tmp/after  2018-02-06 15:40:51.963403599 -0300
        @@ -5,34 +5,34 @@
               2540     2540     1818 |   gnome-terminal-
               3489     3489     2540 |    bash
              32433    32433     3489 |     perf
        -     32434    32434    32433 |      perf
        +     32434    32434    32433 |      make
              32441    32441    32434 |       make
              32514    32514    32441 |        make
                511      511    32514 |         sh
        -       512      512      511 |          sh
        +       512      512      511 |          install
      <SNIP>
      
      We don't have 'perf' calling 'perf' calling 'make', etc, the second
      'perf' actually is 'make', i.e.  there was reordering of the relevant
      PERF_RECORD_COMM and PERF_RECORD_FORK records.
      
      Ditto for sh/install later on.
      
      Look for FORK and COMM meta events, for those tids:
      
        # perf report -D | egrep 'PERF_RECORD_(FORK|COMM)' | egrep '3243[34]'
        0 14774650990679 0x1a3cd8 [0x38]: PERF_RECORD_FORK(32433:32433):(3489:3489)
        1 14774652080381 0x1d6568 [0x30]: PERF_RECORD_COMM exec: perf:32433/32433
        1 14774742473340 0x1dbb48 [0x38]: PERF_RECORD_FORK(32434:32434):(32433:32433)
        0 14774752005779 0x1a4af8 [0x30]: PERF_RECORD_COMM exec: make:32434/32434
        0 14774753997960 0x1a5578 [0x38]: PERF_RECORD_FORK(32435:32435):(32434:32434)
        0 14774756070782 0x1a5618 [0x38]: PERF_RECORD_FORK(32438:32438):(32434:32434)
        0 14774757772939 0x1a5680 [0x38]: PERF_RECORD_FORK(32440:32440):(32434:32434)
        0 14774758230600 0x1a56e8 [0x38]: PERF_RECORD_FORK(32441:32441):(32434:32434)
        #
      
      First column is the cpu, second is the timestamp.
      
      So they are on different CPUs, thus ring buffers, and when we don't use
      the ordered_events class, we end up mixing that up, use it to take
      advantage of the PERF_RECORD_FINISHED_ROUND meta events to go on
      ordering the events using the PERF_SAMPLE_TIME present in the
      PERF_RECORD_{FORK,COMM,EXIT,SAMPLE,etc} records in the ring buffer.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180206181813.10943-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8614ada0
  24. 15 2月, 2018 1 次提交
  25. 17 1月, 2018 4 次提交
    • J
      perf report: Remove the time slices number limitation · 0a3cc3ae
      Jin Yao 提交于
      Previously it was only allowed to use at most 10 time slices in 'perf
      report --time'.
      
      This patch removes this limitation.
      For example, following command line is OK (12 time slices)
      
      perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1515596433-24653-8-git-send-email-yao.jin@linux.intel.com
      [ No need to check for NULL to call free, use zfree ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0a3cc3ae
    • J
      perf report: Add an indication of what time slices are used · 7425664b
      Jin Yao 提交于
      Add a time slices indication to the perf report header.
      
      For example,
      
        # perf report --stdio --time 10%
      
        # Total Lost Samples: 0
        #
        # Samples: 9K of event 'cycles:ppp' (time slices: 10%)
        # Event count (approx.): 8951288803
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Suggested--by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1515596433-24653-6-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7425664b
    • J
      perf report: Improve error msg when no first/last sample time found · eb0b419e
      Jin Yao 提交于
      The following message will be returned to user when executing
      'perf report --time' if perf data file doesn't contain the
      first/last sample time.
      
      "HINT: no first/last sample time found in perf data.
       Please use latest perf binary to execute 'perf record'
       (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')."
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1515596433-24653-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eb0b419e
    • A
      perf unwind: Do not look just at the global callchain_param.record_mode · eabad8c6
      Arnaldo Carvalho de Melo 提交于
      When setting up DWARF callchains on specific events, without using
      'record' or 'trace' --call-graph, but instead doing it like:
      
      	perf trace -e cycles/call-graph=dwarf/
      
      The unwind__prepare_access() call in thread__insert_map() when we
      process PERF_RECORD_MMAP(2) metadata events were not being performed,
      precluding us from using per-event DWARF callchains, handling them just
      when we asked for all events to be DWARF, using "--call-graph dwarf".
      
      We do it in the PERF_RECORD_MMAP because we have to look at one of the
      executable maps to figure out the executable type (64-bit, 32-bit) of
      the DSO laid out in that mmap. Also to look at the architecture where
      the perf.data file was recorded.
      
      All this probably should be deferred to when we process a sample for
      some thread that has callchains, so that we do this processing only for
      the threads with samples, not for all of them.
      
      For now, fix using DWARF on specific events.
      
      Before:
      
        # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.048 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.048/0.048/0.048/0.000 ms
           0.000 probe_libc:inet_pton:(7fe9597bb350))
        Problem processing probe_libc:inet_pton callchain, skipping...
        #
      
      After:
      
        # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.060 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
             0.000 probe_libc:inet_pton:(7fd4aa930350))
                                               __inet_pton (inlined)
                                               gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
                                               __GI_getaddrinfo (inlined)
                                               [0xffffaa804e51af3f] (/usr/bin/ping)
                                               __libc_start_main (/usr/lib64/libc-2.26.so)
                                               [0xffffaa804e51b379] (/usr/bin/ping)
        #
        # perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.057 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.057/0.057/0.057/0.000 ms
             0.000 probe_libc:inet_pton:(7f9363b9e350))
                                               __inet_pton (inlined)
                                               gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
                                               __GI_getaddrinfo (inlined)
                                               [0xffffa9e8a14e0f3f] (/usr/bin/ping)
                                               __libc_start_main (/usr/lib64/libc-2.26.so)
                                               [0xffffa9e8a14e1379] (/usr/bin/ping)
        #
        # perf trace --call-graph=fp --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.077 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.077/0.077/0.077/0.000 ms
             0.000 probe_libc:inet_pton:(7f4947e1c350))
                                               __inet_pton (inlined)
                                               gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
                                               __GI_getaddrinfo (inlined)
                                               [0xffffaa716d88ef3f] (/usr/bin/ping)
                                               __libc_start_main (/usr/lib64/libc-2.26.so)
                                               [0xffffaa716d88f379] (/usr/bin/ping)
        #
        # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=fp/ ping -6 -c 1 ::1
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.078 ms
      
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
             0.000 probe_libc:inet_pton:(7fa157696350))
                                               __GI___inet_pton (/usr/lib64/libc-2.26.so)
                                               getaddrinfo (/usr/lib64/libc-2.26.so)
                                               [0xffffa9ba39c74f40] (/usr/bin/ping)
        #
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/r/20180116182650.GE16107@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eabad8c6