1. 12 11月, 2019 11 次提交
  2. 07 11月, 2019 29 次提交
    • J
      perf report: Sort by sampled cycles percent per block for tui · 7fa46cbf
      Jin Yao 提交于
      Previous patch has implemented a new option "--total-cycles".  But only
      stdio mode is supported.
      
      This patch supports the tui mode and support '--percent-limit'.
      
      For example,
      
       perf record -b ./div
       perf report --total-cycles --percent-limit 1
      
       # Samples: 2753248 of event 'cycles'
       Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
                26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                 5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                 4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                 4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                 3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                 3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                 3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                 2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                 2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                 2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                 2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                 1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                 1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                 1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      --------------------------------------------------
      
       v7:
       ---
       1. Since we have used use_browser in report__browse_block_hists
          to support stdio mode, now we also add supporting for tui.
      
       2. Move block tui browser code from ui/browsers/hists.c
          to block-info.c.
      
       v6:
       ---
       Create report__tui_browse_block_hists in block-info.c
       (codes are moved from builtin-report.c).
      
       v5:
       ---
       Fix a crash issue when running perf report without
       '--total-cycles'. The issue is because the internal flag
       is renamed from 'total_cycles' to 'total_cycles_mode' in
       previous patch but this patch still uses 'total_cycles'
       to check if the '--total-cycles' option is enabled, which
       causes the code to be inconsistent.
      
       v4:
       ---
       Since the block collection is moved out of printing in
       previous patch, this patch is updated accordingly for
       tui supporting.
      
       v3:
       ---
       Minor change since the function name is changed:
       block_total_cycles_percent -> block_info__total_cycles_percent
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fa46cbf
    • J
      perf report: Support --percent-limit for --total-cycles · 0b49f836
      Jin Yao 提交于
      We have already supported the '--total-cycles' option in previous patch.
      It's also useful to show entries only above a threshold percent.
      
      This patch enables '--percent-limit' for not showing entries
      under that percent.
      
      For example:
      
       perf report --total-cycles --stdio --percent-limit 1
      
       # To display the perf.data header info, please use --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 2M of event 'cycles'
       # Event count (approx.): 2753248
       #
       # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
       # ...............  ..............  ...........  ..........  .................................................................  ....................
       #
                  26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                  15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                   5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                   4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                   4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                   3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                   3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                   3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                   2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                   2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                   2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                   2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                   1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                   1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                   1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      Committer testing:
      
      From second exapmple onwards slightly edited for brevity:
      
        # perf report --total-cycles --percent-limit 2 --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
        #
        # (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
        #
        # perf report --total-cycles --percent-limit 1 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
        #
        # perf report --total-cycles --percent-limit 0.7 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
        #
      
      -------------------------------------------
      
      It only shows the entries which 'Sampled Cycles%' > 1%.
      
       v7:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v6:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v5:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v4:
       ---
       No functional change. Only fix the build issue because
       previous patches are changed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0b49f836
    • J
      perf report: Sort by sampled cycles percent per block for stdio · 6f7164fa
      Jin Yao 提交于
      It would be useful to support sorting for all blocks by the sampled
      cycles percent per block. This is useful to concentrate on the globally
      hottest blocks.
      
      This patch implements a new option "--total-cycles" which sorts all
      blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
      
       percent = block sampled cycles aggregation / total sampled cycles
      
      Note that, this patch only supports "--stdio" mode.
      
      For example,
      
        # perf record -b ./div
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 2M of event 'cycles'
        # Event count (approx.): 2753248
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                             [Program Block Range]      Shared Object
        # ...............  ..............  ...........  ..........  ................................................  .................
        #
                   26.04%            2.8M        0.40%          18                            [div.c:42 -> div.c:39]                div
                   15.17%            1.2M        0.16%           7                [random_r.c:357 -> random_r.c:380]       libc-2.27.so
                    5.11%          402.0K        0.04%           2                            [div.c:27 -> div.c:28]                div
                    4.87%          381.6K        0.04%           2                    [random.c:288 -> random.c:291]       libc-2.27.so
                    4.53%          381.0K        0.04%           2                            [div.c:40 -> div.c:40]                div
                    3.85%          300.9K        0.02%           1                            [div.c:22 -> div.c:25]                div
                    3.08%          241.1K        0.02%           1                          [rand.c:26 -> rand.c:27]       libc-2.27.so
                    3.06%          240.0K        0.02%           1                    [random.c:291 -> random.c:291]       libc-2.27.so
                    2.78%          215.7K        0.02%           1                    [random.c:298 -> random.c:298]       libc-2.27.so
                    2.52%          198.3K        0.02%           1                    [random.c:293 -> random.c:293]       libc-2.27.so
                    2.36%          184.8K        0.02%           1                          [rand.c:28 -> rand.c:28]       libc-2.27.so
                    2.33%          180.5K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.28%          176.7K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.20%          168.8K        0.02%           1                        [rand@plt+0 -> rand@plt+0]                div
                    1.98%          158.2K        0.02%           1                [random_r.c:388 -> random_r.c:388]       libc-2.27.so
                    1.57%          123.3K        0.02%           1                            [div.c:42 -> div.c:44]                div
                    1.44%          116.0K        0.42%          19                [random_r.c:357 -> random_r.c:394]       libc-2.27.so
                    0.25%          182.5K        0.02%           1                [random_r.c:388 -> random_r.c:391]       libc-2.27.so
                    0.00%              48        1.07%          48        [x86_pmu_enable+284 -> x86_pmu_enable+298]  [kernel.kallsyms]
                    0.00%              74        1.64%          74             [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92]  [kernel.kallsyms]
                    0.00%              73        1.62%          73                         [vm_mmap+0 -> vm_mmap+48]  [kernel.kallsyms]
                    0.00%              63        0.69%          31                       [up_write+0 -> up_write+34]  [kernel.kallsyms]
                    0.00%              13        0.29%          13      [setup_arg_pages+396 -> setup_arg_pages+413]  [kernel.kallsyms]
                    0.00%               3        0.07%           3      [setup_arg_pages+418 -> setup_arg_pages+450]  [kernel.kallsyms]
                    0.00%             616        6.84%         308   [security_mmap_file+0 -> security_mmap_file+72]  [kernel.kallsyms]
                    0.00%              23        0.51%          23  [security_mmap_file+77 -> security_mmap_file+87]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                  [sched_clock+0 -> sched_clock+4]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                 [sched_clock+9 -> sched_clock+12]  [kernel.kallsyms]
                    0.00%               1        0.02%           1                [rcu_nmi_exit+0 -> rcu_nmi_exit+9]  [kernel.kallsyms]
      
      Committer testing:
      
      This should provide material for hours of endless joy, both from looking
      for suspicious things in the implementation of this patch, such as the
      top one:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    2.17%            1.7M        0.08%         607   [compiler.h:199 -> common.c:221]              [kernel.vmlinux]
      
      As well from things that look legit:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    0.16%          123.0K        0.60%        4.7K   [nospec-branch.h:265 -> nospec-branch.h:278]  [kernel.vmlinux]
      
      :-)
      
      Very short system wide taken branches session:
      
        # perf record -h -b
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -b, --branch-any      sample any taken branches
      
        #
        # perf record -b
        ^C[ perf record: Woken up 595 times to write data ]
        [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
      
        #
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        #
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
                    0.56%          541.8K        0.09%         672                                        [compiler.h:199 -> common.c:300]      [kernel.vmlinux]
                    0.39%          293.2K        0.01%         104                                    [list_debug.c:43 -> list_debug.c:61]      [kernel.vmlinux]
                    0.36%          278.6K        0.03%         272                                    [entry_64.S:1289 -> entry_64.S:1308]      [kernel.vmlinux]
                    0.30%          260.8K        0.07%         564                              [clear_page_64.S:47 -> clear_page_64.S:50]      [kernel.vmlinux]
                    0.28%          215.3K        0.05%         369                                            [traps.c:623 -> traps.c:628]      [kernel.vmlinux]
                    0.23%          178.1K        0.04%         278                                      [entry_64.S:271 -> entry_64.S:275]      [kernel.vmlinux]
                    0.20%          152.6K        0.09%         706                                      [paravirt.c:177 -> paravirt.c:179]      [kernel.vmlinux]
                    0.20%          155.8K        0.05%         373                                      [entry_64.S:153 -> entry_64.S:175]      [kernel.vmlinux]
                    0.18%          136.6K        0.03%         222                                                [msr.h:105 -> msr.h:166]      [kernel.vmlinux]
                    0.16%          123.0K        0.60%        4.7K                            [nospec-branch.h:265 -> nospec-branch.h:278]      [kernel.vmlinux]
                    0.16%          118.3K        0.01%          44                                      [entry_64.S:632 -> entry_64.S:657]      [kernel.vmlinux]
                    0.14%          104.5K        0.00%          28                                          [rwsem.c:1541 -> rwsem.c:1544]      [kernel.vmlinux]
                    0.13%           99.2K        0.01%          53                                      [spinlock.c:150 -> spinlock.c:152]      [kernel.vmlinux]
                    0.13%           95.5K        0.00%          35                                              [swap.c:456 -> swap.c:471]      [kernel.vmlinux]
                    0.12%           96.2K        0.05%         407                              [copy_user_64.S:175 -> copy_user_64.S:209]      [kernel.vmlinux]
                    0.11%           85.9K        0.00%          31                                        [swap.c:400 -> page-flags.h:188]      [kernel.vmlinux]
                    0.10%           73.0K        0.01%          52                                          [paravirt.h:763 -> list.h:131]      [kernel.vmlinux]
                    0.07%           56.2K        0.03%         214                                      [filemap.c:1524 -> filemap.c:1557]      [kernel.vmlinux]
                    0.07%           54.2K        0.02%         145                                        [memory.c:1032 -> memory.c:1049]      [kernel.vmlinux]
                    0.07%           50.3K        0.00%          39                                            [mmzone.c:49 -> mmzone.c:69]      [kernel.vmlinux]
                    0.06%           48.3K        0.01%          40                                   [paravirt.h:768 -> page_alloc.c:3304]      [kernel.vmlinux]
                    0.06%           46.7K        0.02%         155                                        [memory.c:1032 -> memory.c:1056]      [kernel.vmlinux]
                    0.06%           46.9K        0.01%         103                                              [swap.c:867 -> swap.c:902]      [kernel.vmlinux]
                    0.06%           47.8K        0.00%          34                                    [entry_64.S:1201 -> entry_64.S:1202]      [kernel.vmlinux]
      
       -----------------------------------------------------------
      
       v7:
       ---
       Use use_browser in report__browse_block_hists for supporting
       stdio and potential tui mode.
      
       v6:
       ---
       Create report__browse_block_hists in block-info.c (codes are
       moved from builtin-report.c). It's called from
       perf_evlist__tty_browse_hists.
      
       v5:
       ---
       1. Move all block functions to block-info.c
      
       2. Move the code of setting ms in block hist_entry to
          other patch.
      
       v4:
       ---
       1. Use new option '--total-cycles' to replace
          '-s total_cycles' in v3.
      
       2. Move block info collection out of block info
          printing.
      
       v3:
       ---
       1. Use common function block_info__process_sym to
          process the blocks per symbol.
      
       2. Remove the nasty hack for skipping calculation
          of column length
      
       3. Some minor cleanup
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f7164fa
    • J
      perf hist: Support block formats with compare/sort/display · b65a7d37
      Jin Yao 提交于
      This patch provides helper routines to support new columns for block
      info output.
      
      The new columns are:
      
        Sampled Cycles%
        Sampled Cycles
        Avg Cycles%
        Avg Cycles
        [Program Block Range]
        Shared Object
      
       v5:
       ---
       1. Move more block related functions from builtin-report.c to
          block-info.c
      
       2. Set ms (map+sym) in block hist_entry. Because this info
          is needed for reporting the block range (i.e. source line)
      
      Committer notes:
      
      Remove unused set_fmt() function, some build were not completing with:
      
        util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function]
        static inline void set_fmt(struct block_fmt *block_fmt,
                           ^
        1 error generated.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b65a7d37
    • J
      perf hist: Count the total cycles of all samples · 7841f40a
      Jin Yao 提交于
      We can get the per sample cycles by hist__account_cycles(). It's also
      useful to know the total cycles of all samples in order to get the
      cycles coverage for a single program block in further. For example:
      
        coverage = per block sampled cycles / total sampled cycles
      
      This patch creates a new argument 'total_cycles' in hist__account_cycles(),
      which will be added with the cycles of each sample.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7841f40a
    • J
      perf block: Cleanup and refactor block info functions · 60414418
      Jin Yao 提交于
      We have already implemented some block-info related functions.
      Now it's time to do some cleanup, refactoring and move the
      functions and structures to new block-info.h/block-info.c.
      
       v4:
       ---
       Move code for skipping column length calculation to patch:
       'perf diff: Don't use hack to skip column length calculation'
      
       v3:
       ---
       1. Rename the patch title
       2. Rename from block.h/block.c to block-info.h/block-info.c
       3. Move more common part to block-info, such as
          block_info__process_sym.
       4. Remove the nasty hack for skipping calculation of column
          length
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      60414418
    • J
      perf diff: Don't use hack to skip column length calculation · 0bdf181f
      Jin Yao 提交于
      Previously we use a nasty hack to skip the hists__calc_col_len for block
      since this function is not very suitable for block column length
      calculation.
      
      This patch removes the hack code and add a check at the entry of
      hists__calc_col_len to skip for block case.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bdf181f
    • M
      perf probe: Skip overlapped location on searching variables · dee36a2a
      Masami Hiramatsu 提交于
      Since debuginfo__find_probes() callback function can be called with  the
      location which already passed, the callback function must filter out
      such overlapped locations.
      
      add_probe_trace_event() has already done it by commit 1a375ae7
      ("perf probe: Skip same probe address for a given line"), but
      add_available_vars() doesn't. Thus perf probe -v shows same address
      repeatedly as below:
      
        # perf probe -V vfs_read:18
        Available variables at vfs_read:18
                @<vfs_read+217>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
                @<vfs_read+217>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
                @<vfs_read+226>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
      
      With this fix, perf probe -V shows it correctly:
      
        # perf probe -V vfs_read:18
        Available variables at vfs_read:18
                @<vfs_read+217>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
                @<vfs_read+226>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
      
      Fixes: cf6eb489 ("perf probe: Show accessible local variables")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241938927.32002.4026859017790562751.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dee36a2a
    • M
      perf probe: Fix to show calling lines of inlined functions · 86c0bf85
      Masami Hiramatsu 提交于
      Fix to show calling lines of inlined functions (where an inline function
      is called).
      
      die_walk_lines() filtered out the lines inside inlined functions based
      on the address. However this also filtered out the lines which call
      those inlined functions from the target function.
      
      To solve this issue, check the call_file and call_line attributes and do
      not filter out if it matches to the line information.
      
      Without this fix, perf probe -L doesn't show some lines correctly.
      (don't see the lines after 17)
      
        # perf probe -L vfs_read
        <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
              0  ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
              1  {
              2         ssize_t ret;
      
              4         if (!(file->f_mode & FMODE_READ))
                                return -EBADF;
              6         if (!(file->f_mode & FMODE_CAN_READ))
                                return -EINVAL;
              8         if (unlikely(!access_ok(buf, count)))
                                return -EFAULT;
      
             11         ret = rw_verify_area(READ, file, pos, count);
             12         if (!ret) {
             13                 if (count > MAX_RW_COUNT)
                                        count =  MAX_RW_COUNT;
             15                 ret = __vfs_read(file, buf, count, pos);
             16                 if (ret > 0) {
                                        fsnotify_access(file);
                                        add_rchar(current, ret);
                                }
      
      With this fix:
      
        # perf probe -L vfs_read
        <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
              0  ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
              1  {
              2         ssize_t ret;
      
              4         if (!(file->f_mode & FMODE_READ))
                                return -EBADF;
              6         if (!(file->f_mode & FMODE_CAN_READ))
                                return -EINVAL;
              8         if (unlikely(!access_ok(buf, count)))
                                return -EFAULT;
      
             11         ret = rw_verify_area(READ, file, pos, count);
             12         if (!ret) {
             13                 if (count > MAX_RW_COUNT)
                                        count =  MAX_RW_COUNT;
             15                 ret = __vfs_read(file, buf, count, pos);
             16                 if (ret > 0) {
             17                         fsnotify_access(file);
             18                         add_rchar(current, ret);
                                }
             20                 inc_syscr(current);
                        }
      
      Fixes: 4cc9cec6 ("perf probe: Introduce lines walker interface")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241937995.32002.17899884017011512577.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86c0bf85
    • M
      perf probe: Filter out instances except for inlined subroutine and subprogram · da6cb952
      Masami Hiramatsu 提交于
      Filter out instances except for inlined_subroutine and subprogram DIE in
      die_walk_instances() and die_is_func_instance().
      
      This fixes an issue that perf probe sets some probes on calling address
      instead of a target function itself.
      
      When perf probe walks on instances of an abstruct origin (a kind of
      function prototype of inlined function), die_walk_instances() can also
      pass a GNU_call_site (a GNU extension for call site) to callback. Since
      it is not an inlined instance of target function, we have to filter out
      when searching a probe point.
      
      Without this patch, perf probe sets probes on call site address too.This
      can happen on some function which is marked "inlined", but has actual
      symbol. (I'm not sure why GCC mark it "inlined"):
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+2500017
        p:probe/vfs_read_1 _text+2499468
        p:probe/vfs_read_2 _text+2499563
        p:probe/vfs_read_3 _text+2498876
        p:probe/vfs_read_4 _text+2498512
        p:probe/vfs_read_5 _text+2498627
      
      With this patch:
      
      Slightly different results, similar tho:
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+2498512
      
      Committer testing:
      
        # uname -a
        Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
      
      Before:
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+3131557
        p:probe/vfs_read_1 _text+3130975
        p:probe/vfs_read_2 _text+3131047
        p:probe/vfs_read_3 _text+3130380
        p:probe/vfs_read_4 _text+3130000
        # uname -a
        Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
        #
      
      After:
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+3130000
        #
      
      Fixes: db0d2c64 ("perf probe: Search concrete out-of-line instances")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241937063.32002.11024544873990816590.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      da6cb952
    • M
      perf probe: Skip end-of-sequence and non statement lines · f4d99bdf
      Masami Hiramatsu 提交于
      Skip end-of-sequence and non-statement lines while walking through lines
      list.
      
      The "end-of-sequence" line information means:
      
       "the current address is that of the first byte after the
        end of a sequence of target machine instructions."
       (DWARF version 4 spec 6.2.2)
      
      This actually means out of scope and we can not probe on it.
      
      On the other hand, the statement lines (is_stmt) means:
      
       "the current instruction is a recommended breakpoint location.
        A recommended breakpoint location is intended to “represent”
        a line, a statement and/or a semantically distinct subpart
        of a statement."
      
       (DWARF version 4 spec 6.2.2)
      
      So, non-statement line info also should be skipped.
      
      These can reduce unneeded probe points and also avoid an error.
      
      E.g. without this patch:
      
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new events:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1
      
        #
      
      This puts 5 probes on one line, but acutally it's not inlined function.
      This is because there are many non statement instructions at the
      function prologue.
      
      With this patch:
      
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new event:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
      
        #
      
      Now perf-probe skips unneeded addresses.
      
      Committer testing:
      
      Slightly different results, but similar:
      
      Before:
      
        # uname -a
        Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
        #
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new events:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1
      
        #
      
      After:
      
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new event:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
      
        # perf probe -l
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
        #
      
      Fixes: 4cc9cec6 ("perf probe: Introduce lines walker interface")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241936090.32002.12156347518596111660.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f4d99bdf
    • M
      perf probe: Return a better scope DIE if there is no best scope · c701636a
      Masami Hiramatsu 提交于
      Make find_best_scope() returns innermost DIE at given address if there
      is no best matched scope DIE. Since Gcc sometimes generates intuitively
      strange line info which is out of inlined function address range, we
      need this fixup.
      
      Without this, sometimes perf probe failed to probe on a line inside an
      inlined function:
      
        # perf probe -D ksys_open:3
        Failed to find scope of probe point.
          Error: Failed to add events.
      
      With this fix, 'perf probe' can probe it:
      
        # perf probe -D ksys_open:3
        p:probe/ksys_open _text+25707308
        p:probe/ksys_open_1 _text+25710596
        p:probe/ksys_open_2 _text+25711114
        p:probe/ksys_open_3 _text+25711343
        p:probe/ksys_open_4 _text+25714058
        p:probe/ksys_open_5 _text+2819653
        p:probe/ksys_open_6 _text+2819701
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c701636a
    • I
      perf annotate: Fix heap overflow · 5c65b1c0
      Ian Rogers 提交于
      Fix expand_tabs that copies the source lines '\0' and then appends
      another '\0' at a potentially out of bounds address.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20191026035644.217548-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5c65b1c0
    • A
      perf machine: Add kernel_dso() method · 93730f85
      Arnaldo Carvalho de Melo 提交于
      To reduce boilerplate in some places.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-9s1bgoxxhlnu037e1nqx0tw3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      93730f85
    • A
      perf symbols: Remove needless checks for map->groups->machine · b0c76fc4
      Arnaldo Carvalho de Melo 提交于
      Its sufficient to check if map->groups is NULL before using it to get
      ->machine value.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-utiepyiv8b1tf8f79ok9d6j8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b0c76fc4
    • I
      perf parse: Add a deep delete for parse event terms · 1dc92556
      Ian Rogers 提交于
      Add a parse_events_term deep delete function so that owned strings and
      arrays are freed.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-10-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1dc92556
    • I
      perf parse: If pmu configuration fails free terms · 38f2c422
      Ian Rogers 提交于
      Avoid a memory leak when the configuration fails.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-9-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      38f2c422
    • I
      perf parse: Before yyabort-ing free components · cabbf268
      Ian Rogers 提交于
      Yyabort doesn't destruct inputs and so this must be done manually before
      using yyabort.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-8-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cabbf268
    • I
      perf parse: Add destructors for parse event terms · f2a8ecd8
      Ian Rogers 提交于
      If parsing fails then destructors are ran to clean the up the stack.
      Rename the head union member to make the term and evlist use cases more
      distinct, this simplifies matching the correct destructor.
      
      Committer notes:
      
      Jiri: "Nice did not know about this.. looks like it's been in bison for some time, right?"
      
      Ian:  "Looks like it wasn't in Bison 1 but in Bison 2, we're at Bison 3 and
             Bison 2 is > 14 years old:
             https://web.archive.org/web/20050924004158/http://www.gnu.org/software/bison/manual/html_mono/bison.html#Destructor-Decl"
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-7-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f2a8ecd8
    • I
      perf parse: Ensure config and str in terms are unique · b6645a72
      Ian Rogers 提交于
      Make it easier to release memory associated with parse event terms by
      duplicating the string for the config name and ensuring the val string
      is a duplicate.
      
      Currently the parser may memory leak terms and this is addressed in a
      later patch.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b6645a72
    • I
      perf parse: Add parse events handle error · 448d732c
      Ian Rogers 提交于
      Parse event error handling may overwrite one error string with another
      creating memory leaks. Introduce a helper routine that warns about
      multiple error messages as well as avoiding the memory leak.
      
      A reproduction of this problem can be seen with:
      
        perf stat -e c/c/
      
      After this change this produces:
      WARNING: multiple event parsing errors
      event syntax error: 'c/c/'
                             \___ unknown term
      
      valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore
      Run 'perf list' for a list of valid events
      
       Usage: perf stat [<options>] [<command>]
      
          -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      448d732c
    • J
      perf stat: Add --per-node agregation support · 86895b48
      Jiri Olsa 提交于
      Adding new --per-node option to aggregate counts per NUMA
      nodes for system-wide mode measurements.
      
      You can specify --per-node in live mode:
      
        # perf stat  -a -I 1000 -e cycles --per-node
        #           time node   cpus             counts unit events
             1.000542550 N0       20          6,202,097      cycles
             1.000542550 N1       20            639,559      cycles
             2.002040063 N0       20          7,412,495      cycles
             2.002040063 N1       20          2,185,577      cycles
             3.003451699 N0       20          6,508,917      cycles
             3.003451699 N1       20            765,607      cycles
        ...
      
      Or in the record/report stat session:
      
        # perf stat record -a -I 1000 -e cycles
        #           time             counts unit events
             1.000536937         10,008,468      cycles
             2.002090152          9,578,539      cycles
             3.003625233          7,647,869      cycles
             4.005135036          7,032,086      cycles
        ^C     4.340902364          3,923,893      cycles
      
        # perf stat report --per-node
        #           time node   cpus             counts unit events
             1.000536937 N0       20          9,355,086      cycles
             1.000536937 N1       20            653,382      cycles
             2.002090152 N0       20          7,712,838      cycles
             2.002090152 N1       20          1,865,701      cycles
             3.003625233 N0       20          6,604,441      cycles
             3.003625233 N1       20          1,043,428      cycles
             4.005135036 N0       20          6,350,522      cycles
             4.005135036 N1       20            681,564      cycles
             4.340902364 N0       20          3,403,188      cycles
             4.340902364 N1       20            520,705      cycles
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86895b48
    • J
      perf env: Add perf_env__numa_node() · 389799a7
      Jiri Olsa 提交于
      To speed up cpu to node lookup, add perf_env__numa_node(), that creates
      cpu array on the first lookup, that holds numa nodes for each stored
      cpu.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20190904073415.723-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      389799a7
    • I
      perf tools: Splice events onto evlist even on error · 8e8714c3
      Ian Rogers 提交于
      If event parsing fails the event list is leaked, instead splice the list
      onto the out result and let the caller cleanup.
      
      An example input for parse_events found by libFuzzer that reproduces
      this memory leak is 'm{'.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191025180827.191916-5-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8e8714c3
    • A
      perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators · 50481461
      Arnaldo Carvalho de Melo 提交于
      To reduce boilerplate, providing a more compact form to iterate over the
      maps in a map_group.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-gc3go6fmdn30twusg91t2q56@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      50481461
    • A
      perf maps: Add for_each_entry()/_safe() iterators · 8efc4f05
      Arnaldo Carvalho de Melo 提交于
      To reduce boilerplate, provide a more compact form using an idiom
      present in other trees of data structures.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8efc4f05
    • A
      perf map: Allow map__next() to receive a NULL arg · 20419d3a
      Arnaldo Carvalho de Melo 提交于
      Just like free(), return NULL in that case, will simplify the
      for_each_entry_safe() iterators.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-pbde2ucn49khnrebclys9pny@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      20419d3a
    • A
      perf map: Check if the map still has some refcounts on exit · ee2555b6
      Arnaldo Carvalho de Melo 提交于
      We were checking just if it was still on some rb tree, but that is not
      the only way that this map can still have references, map->refcnt is
      there exactly for this, use it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-hany65tbeavsax7n3xvwl9pc@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ee2555b6
    • A
      perf dso: Add dso__data_write_cache_addr() · b86a9d91
      Adrian Hunter 提交于
      Add functions to write into the dso file data cache, but not change the
      file itself.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20191025130000.13032-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b86a9d91