1. 12 11月, 2019 10 次提交
  2. 07 11月, 2019 30 次提交
    • J
      perf report: Sort by sampled cycles percent per block for tui · 7fa46cbf
      Jin Yao 提交于
      Previous patch has implemented a new option "--total-cycles".  But only
      stdio mode is supported.
      
      This patch supports the tui mode and support '--percent-limit'.
      
      For example,
      
       perf record -b ./div
       perf report --total-cycles --percent-limit 1
      
       # Samples: 2753248 of event 'cycles'
       Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
                26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                 5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                 4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                 4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                 3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                 3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                 3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                 2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                 2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                 2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                 2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                 2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                 1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                 1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                 1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      --------------------------------------------------
      
       v7:
       ---
       1. Since we have used use_browser in report__browse_block_hists
          to support stdio mode, now we also add supporting for tui.
      
       2. Move block tui browser code from ui/browsers/hists.c
          to block-info.c.
      
       v6:
       ---
       Create report__tui_browse_block_hists in block-info.c
       (codes are moved from builtin-report.c).
      
       v5:
       ---
       Fix a crash issue when running perf report without
       '--total-cycles'. The issue is because the internal flag
       is renamed from 'total_cycles' to 'total_cycles_mode' in
       previous patch but this patch still uses 'total_cycles'
       to check if the '--total-cycles' option is enabled, which
       causes the code to be inconsistent.
      
       v4:
       ---
       Since the block collection is moved out of printing in
       previous patch, this patch is updated accordingly for
       tui supporting.
      
       v3:
       ---
       Minor change since the function name is changed:
       block_total_cycles_percent -> block_info__total_cycles_percent
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fa46cbf
    • J
      perf report: Support --percent-limit for --total-cycles · 0b49f836
      Jin Yao 提交于
      We have already supported the '--total-cycles' option in previous patch.
      It's also useful to show entries only above a threshold percent.
      
      This patch enables '--percent-limit' for not showing entries
      under that percent.
      
      For example:
      
       perf report --total-cycles --stdio --percent-limit 1
      
       # To display the perf.data header info, please use --header/--header-only options.
       #
       #
       # Total Lost Samples: 0
       #
       # Samples: 2M of event 'cycles'
       # Event count (approx.): 2753248
       #
       # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
       # ...............  ..............  ...........  ..........  .................................................................  ....................
       #
                  26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
                  15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
                   5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
                   4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
                   4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
                   3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
                   3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
                   3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
                   2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
                   2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
                   2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
                   2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
                   2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
                   1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
                   1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
                   1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
      
      Committer testing:
      
      From second exapmple onwards slightly edited for brevity:
      
        # perf report --total-cycles --percent-limit 2 --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
        #
        # (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
        #
        # perf report --total-cycles --percent-limit 1 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
        #
        # perf report --total-cycles --percent-limit 0.7 --stdio
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
        #
      
      -------------------------------------------
      
      It only shows the entries which 'Sampled Cycles%' > 1%.
      
       v7:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v6:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v5:
       ---
       No functional change. Only fix the conflict issue because
       previous patches are changed.
      
       v4:
       ---
       No functional change. Only fix the build issue because
       previous patches are changed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0b49f836
    • J
      perf report: Sort by sampled cycles percent per block for stdio · 6f7164fa
      Jin Yao 提交于
      It would be useful to support sorting for all blocks by the sampled
      cycles percent per block. This is useful to concentrate on the globally
      hottest blocks.
      
      This patch implements a new option "--total-cycles" which sorts all
      blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
      
       percent = block sampled cycles aggregation / total sampled cycles
      
      Note that, this patch only supports "--stdio" mode.
      
      For example,
      
        # perf record -b ./div
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 2M of event 'cycles'
        # Event count (approx.): 2753248
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                             [Program Block Range]      Shared Object
        # ...............  ..............  ...........  ..........  ................................................  .................
        #
                   26.04%            2.8M        0.40%          18                            [div.c:42 -> div.c:39]                div
                   15.17%            1.2M        0.16%           7                [random_r.c:357 -> random_r.c:380]       libc-2.27.so
                    5.11%          402.0K        0.04%           2                            [div.c:27 -> div.c:28]                div
                    4.87%          381.6K        0.04%           2                    [random.c:288 -> random.c:291]       libc-2.27.so
                    4.53%          381.0K        0.04%           2                            [div.c:40 -> div.c:40]                div
                    3.85%          300.9K        0.02%           1                            [div.c:22 -> div.c:25]                div
                    3.08%          241.1K        0.02%           1                          [rand.c:26 -> rand.c:27]       libc-2.27.so
                    3.06%          240.0K        0.02%           1                    [random.c:291 -> random.c:291]       libc-2.27.so
                    2.78%          215.7K        0.02%           1                    [random.c:298 -> random.c:298]       libc-2.27.so
                    2.52%          198.3K        0.02%           1                    [random.c:293 -> random.c:293]       libc-2.27.so
                    2.36%          184.8K        0.02%           1                          [rand.c:28 -> rand.c:28]       libc-2.27.so
                    2.33%          180.5K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.28%          176.7K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                    2.20%          168.8K        0.02%           1                        [rand@plt+0 -> rand@plt+0]                div
                    1.98%          158.2K        0.02%           1                [random_r.c:388 -> random_r.c:388]       libc-2.27.so
                    1.57%          123.3K        0.02%           1                            [div.c:42 -> div.c:44]                div
                    1.44%          116.0K        0.42%          19                [random_r.c:357 -> random_r.c:394]       libc-2.27.so
                    0.25%          182.5K        0.02%           1                [random_r.c:388 -> random_r.c:391]       libc-2.27.so
                    0.00%              48        1.07%          48        [x86_pmu_enable+284 -> x86_pmu_enable+298]  [kernel.kallsyms]
                    0.00%              74        1.64%          74             [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92]  [kernel.kallsyms]
                    0.00%              73        1.62%          73                         [vm_mmap+0 -> vm_mmap+48]  [kernel.kallsyms]
                    0.00%              63        0.69%          31                       [up_write+0 -> up_write+34]  [kernel.kallsyms]
                    0.00%              13        0.29%          13      [setup_arg_pages+396 -> setup_arg_pages+413]  [kernel.kallsyms]
                    0.00%               3        0.07%           3      [setup_arg_pages+418 -> setup_arg_pages+450]  [kernel.kallsyms]
                    0.00%             616        6.84%         308   [security_mmap_file+0 -> security_mmap_file+72]  [kernel.kallsyms]
                    0.00%              23        0.51%          23  [security_mmap_file+77 -> security_mmap_file+87]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                  [sched_clock+0 -> sched_clock+4]  [kernel.kallsyms]
                    0.00%               4        0.02%           1                 [sched_clock+9 -> sched_clock+12]  [kernel.kallsyms]
                    0.00%               1        0.02%           1                [rcu_nmi_exit+0 -> rcu_nmi_exit+9]  [kernel.kallsyms]
      
      Committer testing:
      
      This should provide material for hours of endless joy, both from looking
      for suspicious things in the implementation of this patch, such as the
      top one:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    2.17%            1.7M        0.08%         607   [compiler.h:199 -> common.c:221]              [kernel.vmlinux]
      
      As well from things that look legit:
      
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                    0.16%          123.0K        0.60%        4.7K   [nospec-branch.h:265 -> nospec-branch.h:278]  [kernel.vmlinux]
      
      :-)
      
      Very short system wide taken branches session:
      
        # perf record -h -b
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -b, --branch-any      sample any taken branches
      
        #
        # perf record -b
        ^C[ perf record: Woken up 595 times to write data ]
        [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
      
        #
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        #
        # perf report --total-cycles --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 6M of event 'cycles'
        # Event count (approx.): 6299936
        #
        # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
        # ...............  ..............  ...........  ..........  ......................................................................  ....................
        #
                    2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                    1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                    0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
                    0.56%          541.8K        0.09%         672                                        [compiler.h:199 -> common.c:300]      [kernel.vmlinux]
                    0.39%          293.2K        0.01%         104                                    [list_debug.c:43 -> list_debug.c:61]      [kernel.vmlinux]
                    0.36%          278.6K        0.03%         272                                    [entry_64.S:1289 -> entry_64.S:1308]      [kernel.vmlinux]
                    0.30%          260.8K        0.07%         564                              [clear_page_64.S:47 -> clear_page_64.S:50]      [kernel.vmlinux]
                    0.28%          215.3K        0.05%         369                                            [traps.c:623 -> traps.c:628]      [kernel.vmlinux]
                    0.23%          178.1K        0.04%         278                                      [entry_64.S:271 -> entry_64.S:275]      [kernel.vmlinux]
                    0.20%          152.6K        0.09%         706                                      [paravirt.c:177 -> paravirt.c:179]      [kernel.vmlinux]
                    0.20%          155.8K        0.05%         373                                      [entry_64.S:153 -> entry_64.S:175]      [kernel.vmlinux]
                    0.18%          136.6K        0.03%         222                                                [msr.h:105 -> msr.h:166]      [kernel.vmlinux]
                    0.16%          123.0K        0.60%        4.7K                            [nospec-branch.h:265 -> nospec-branch.h:278]      [kernel.vmlinux]
                    0.16%          118.3K        0.01%          44                                      [entry_64.S:632 -> entry_64.S:657]      [kernel.vmlinux]
                    0.14%          104.5K        0.00%          28                                          [rwsem.c:1541 -> rwsem.c:1544]      [kernel.vmlinux]
                    0.13%           99.2K        0.01%          53                                      [spinlock.c:150 -> spinlock.c:152]      [kernel.vmlinux]
                    0.13%           95.5K        0.00%          35                                              [swap.c:456 -> swap.c:471]      [kernel.vmlinux]
                    0.12%           96.2K        0.05%         407                              [copy_user_64.S:175 -> copy_user_64.S:209]      [kernel.vmlinux]
                    0.11%           85.9K        0.00%          31                                        [swap.c:400 -> page-flags.h:188]      [kernel.vmlinux]
                    0.10%           73.0K        0.01%          52                                          [paravirt.h:763 -> list.h:131]      [kernel.vmlinux]
                    0.07%           56.2K        0.03%         214                                      [filemap.c:1524 -> filemap.c:1557]      [kernel.vmlinux]
                    0.07%           54.2K        0.02%         145                                        [memory.c:1032 -> memory.c:1049]      [kernel.vmlinux]
                    0.07%           50.3K        0.00%          39                                            [mmzone.c:49 -> mmzone.c:69]      [kernel.vmlinux]
                    0.06%           48.3K        0.01%          40                                   [paravirt.h:768 -> page_alloc.c:3304]      [kernel.vmlinux]
                    0.06%           46.7K        0.02%         155                                        [memory.c:1032 -> memory.c:1056]      [kernel.vmlinux]
                    0.06%           46.9K        0.01%         103                                              [swap.c:867 -> swap.c:902]      [kernel.vmlinux]
                    0.06%           47.8K        0.00%          34                                    [entry_64.S:1201 -> entry_64.S:1202]      [kernel.vmlinux]
      
       -----------------------------------------------------------
      
       v7:
       ---
       Use use_browser in report__browse_block_hists for supporting
       stdio and potential tui mode.
      
       v6:
       ---
       Create report__browse_block_hists in block-info.c (codes are
       moved from builtin-report.c). It's called from
       perf_evlist__tty_browse_hists.
      
       v5:
       ---
       1. Move all block functions to block-info.c
      
       2. Move the code of setting ms in block hist_entry to
          other patch.
      
       v4:
       ---
       1. Use new option '--total-cycles' to replace
          '-s total_cycles' in v3.
      
       2. Move block info collection out of block info
          printing.
      
       v3:
       ---
       1. Use common function block_info__process_sym to
          process the blocks per symbol.
      
       2. Remove the nasty hack for skipping calculation
          of column length
      
       3. Some minor cleanup
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f7164fa
    • J
      perf hist: Support block formats with compare/sort/display · b65a7d37
      Jin Yao 提交于
      This patch provides helper routines to support new columns for block
      info output.
      
      The new columns are:
      
        Sampled Cycles%
        Sampled Cycles
        Avg Cycles%
        Avg Cycles
        [Program Block Range]
        Shared Object
      
       v5:
       ---
       1. Move more block related functions from builtin-report.c to
          block-info.c
      
       2. Set ms (map+sym) in block hist_entry. Because this info
          is needed for reporting the block range (i.e. source line)
      
      Committer notes:
      
      Remove unused set_fmt() function, some build were not completing with:
      
        util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function]
        static inline void set_fmt(struct block_fmt *block_fmt,
                           ^
        1 error generated.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b65a7d37
    • J
      perf hist: Count the total cycles of all samples · 7841f40a
      Jin Yao 提交于
      We can get the per sample cycles by hist__account_cycles(). It's also
      useful to know the total cycles of all samples in order to get the
      cycles coverage for a single program block in further. For example:
      
        coverage = per block sampled cycles / total sampled cycles
      
      This patch creates a new argument 'total_cycles' in hist__account_cycles(),
      which will be added with the cycles of each sample.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7841f40a
    • J
      perf block: Cleanup and refactor block info functions · 60414418
      Jin Yao 提交于
      We have already implemented some block-info related functions.
      Now it's time to do some cleanup, refactoring and move the
      functions and structures to new block-info.h/block-info.c.
      
       v4:
       ---
       Move code for skipping column length calculation to patch:
       'perf diff: Don't use hack to skip column length calculation'
      
       v3:
       ---
       1. Rename the patch title
       2. Rename from block.h/block.c to block-info.h/block-info.c
       3. Move more common part to block-info, such as
          block_info__process_sym.
       4. Remove the nasty hack for skipping calculation of column
          length
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      60414418
    • J
      perf diff: Don't use hack to skip column length calculation · 0bdf181f
      Jin Yao 提交于
      Previously we use a nasty hack to skip the hists__calc_col_len for block
      since this function is not very suitable for block column length
      calculation.
      
      This patch removes the hack code and add a check at the entry of
      hists__calc_col_len to skip for block case.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bdf181f
    • L
      perf tests: Fix out of bounds memory access · af8490eb
      Leo Yan 提交于
      The test case 'Read backward ring buffer' failed on 32-bit architectures
      which were found by LKFT perf testing.  The test failed on arm32 x15
      device, qemu_arm32, qemu_i386, and found intermittent failure on i386;
      the failure log is as below:
      
        50: Read backward ring buffer                  :
        --- start ---
        test child forked, pid 510
        Using CPUID GenuineIntel-6-9E-9
        mmap size 1052672B
        mmap size 8192B
        Finished reading overwrite ring buffer: rewind
        free(): invalid next size (fast)
        test child interrupted
        ---- end ----
        Read backward ring buffer: FAILED!
      
      The log hints there have issue for memory usage, thus free() reports
      error 'invalid next size' and directly exit for the case.  Finally, this
      issue is root caused as out of bounds memory access for the data array
      'evsel->id'.
      
      The backward ring buffer test invokes do_test() twice.  'evsel->id' is
      allocated at the first call with the flow:
      
        test__backward_ring_buffer()
          `-> do_test()
      	  `-> evlist__mmap()
      	        `-> evlist__mmap_ex()
      	              `-> perf_evsel__alloc_id()
      
      So 'evsel->id' is allocated with one item, and it will be used in
      function perf_evlist__id_add():
      
         evsel->id[0] = id
         evsel->ids   = 1
      
      At the second call for do_test(), it skips to initialize 'evsel->id'
      and reuses the array which is allocated in the first call.  But
      'evsel->ids' contains the stale value.  Thus:
      
         evsel->id[1] = id    -> out of bound access
         evsel->ids   = 2
      
      To fix this issue, we will use evlist__open() and evlist__close() pair
      functions to prepare and cleanup context for evlist; so 'evsel->id' and
      'evsel->ids' can be initialized properly when invoke do_test() and avoid
      the out of bounds memory access.
      
      Fixes: ee74701e ("perf tests: Add test to check backward ring buffer")
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: stable@vger.kernel.org # v4.10+
      Link: http://lore.kernel.org/lkml/20191107020244.2427-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      af8490eb
    • J
      perf record: Add support for limit perf output file size · 6d575816
      Jiwei Sun 提交于
      The patch adds a new option to limit the output file size, then based on
      it, we can create a wrapper of the perf command that uses the option to
      avoid exhausting the disk space by the unconscious user.
      
      In order to make the perf.data parsable, we just limit the sample data
      size, since the perf.data consists of many headers and sample data and
      other data, the actual size of the recorded file will bigger than the
      setting value.
      
      Testing it:
      
        # ./perf record -a -g --max-size=10M
        Couldn't synthesize bpf events.
        [ perf record: perf size limit reached (10249 KB), stopping session ]
        [ perf record: Woken up 32 times to write data ]
        [ perf record: Captured and wrote 10.133 MB perf.data (71964 samples) ]
      
        # ls -lh perf.data
        -rw------- 1 root root 11M Oct 22 14:32 perf.data
      
        # ./perf record -a -g --max-size=10K
        [ perf record: perf size limit reached (10 KB), stopping session ]
        Couldn't synthesize bpf events.
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 1.546 MB perf.data (69 samples) ]
      
        # ls -l perf.data
        -rw------- 1 root root 1626952 Oct 22 14:36 perf.data
      
      Committer notes:
      
      Fixed the build in multiple distros by using PRIu64 to print u64 struct
      members, fixing this:
      
        builtin-record.c: In function 'record__write':
        builtin-record.c:150:5: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=]
             rec->bytes_written >> 10);
             ^
          CC       /tmp/build/pe
      Signed-off-by: NJiwei Sun <jiwei.sun@windriver.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Danter <richard.danter@windriver.com>
      Link: http://lore.kernel.org/lkml/20191022080901.3841-1-jiwei.sun@windriver.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d575816
    • M
      perf probe: Skip overlapped location on searching variables · dee36a2a
      Masami Hiramatsu 提交于
      Since debuginfo__find_probes() callback function can be called with  the
      location which already passed, the callback function must filter out
      such overlapped locations.
      
      add_probe_trace_event() has already done it by commit 1a375ae7
      ("perf probe: Skip same probe address for a given line"), but
      add_available_vars() doesn't. Thus perf probe -v shows same address
      repeatedly as below:
      
        # perf probe -V vfs_read:18
        Available variables at vfs_read:18
                @<vfs_read+217>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
                @<vfs_read+217>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
                @<vfs_read+226>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
      
      With this fix, perf probe -V shows it correctly:
      
        # perf probe -V vfs_read:18
        Available variables at vfs_read:18
                @<vfs_read+217>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
                @<vfs_read+226>
                        char*   buf
                        loff_t* pos
                        ssize_t ret
                        struct file*    file
      
      Fixes: cf6eb489 ("perf probe: Show accessible local variables")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241938927.32002.4026859017790562751.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dee36a2a
    • M
      perf probe: Fix to show calling lines of inlined functions · 86c0bf85
      Masami Hiramatsu 提交于
      Fix to show calling lines of inlined functions (where an inline function
      is called).
      
      die_walk_lines() filtered out the lines inside inlined functions based
      on the address. However this also filtered out the lines which call
      those inlined functions from the target function.
      
      To solve this issue, check the call_file and call_line attributes and do
      not filter out if it matches to the line information.
      
      Without this fix, perf probe -L doesn't show some lines correctly.
      (don't see the lines after 17)
      
        # perf probe -L vfs_read
        <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
              0  ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
              1  {
              2         ssize_t ret;
      
              4         if (!(file->f_mode & FMODE_READ))
                                return -EBADF;
              6         if (!(file->f_mode & FMODE_CAN_READ))
                                return -EINVAL;
              8         if (unlikely(!access_ok(buf, count)))
                                return -EFAULT;
      
             11         ret = rw_verify_area(READ, file, pos, count);
             12         if (!ret) {
             13                 if (count > MAX_RW_COUNT)
                                        count =  MAX_RW_COUNT;
             15                 ret = __vfs_read(file, buf, count, pos);
             16                 if (ret > 0) {
                                        fsnotify_access(file);
                                        add_rchar(current, ret);
                                }
      
      With this fix:
      
        # perf probe -L vfs_read
        <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
              0  ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
              1  {
              2         ssize_t ret;
      
              4         if (!(file->f_mode & FMODE_READ))
                                return -EBADF;
              6         if (!(file->f_mode & FMODE_CAN_READ))
                                return -EINVAL;
              8         if (unlikely(!access_ok(buf, count)))
                                return -EFAULT;
      
             11         ret = rw_verify_area(READ, file, pos, count);
             12         if (!ret) {
             13                 if (count > MAX_RW_COUNT)
                                        count =  MAX_RW_COUNT;
             15                 ret = __vfs_read(file, buf, count, pos);
             16                 if (ret > 0) {
             17                         fsnotify_access(file);
             18                         add_rchar(current, ret);
                                }
             20                 inc_syscr(current);
                        }
      
      Fixes: 4cc9cec6 ("perf probe: Introduce lines walker interface")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241937995.32002.17899884017011512577.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86c0bf85
    • M
      perf probe: Filter out instances except for inlined subroutine and subprogram · da6cb952
      Masami Hiramatsu 提交于
      Filter out instances except for inlined_subroutine and subprogram DIE in
      die_walk_instances() and die_is_func_instance().
      
      This fixes an issue that perf probe sets some probes on calling address
      instead of a target function itself.
      
      When perf probe walks on instances of an abstruct origin (a kind of
      function prototype of inlined function), die_walk_instances() can also
      pass a GNU_call_site (a GNU extension for call site) to callback. Since
      it is not an inlined instance of target function, we have to filter out
      when searching a probe point.
      
      Without this patch, perf probe sets probes on call site address too.This
      can happen on some function which is marked "inlined", but has actual
      symbol. (I'm not sure why GCC mark it "inlined"):
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+2500017
        p:probe/vfs_read_1 _text+2499468
        p:probe/vfs_read_2 _text+2499563
        p:probe/vfs_read_3 _text+2498876
        p:probe/vfs_read_4 _text+2498512
        p:probe/vfs_read_5 _text+2498627
      
      With this patch:
      
      Slightly different results, similar tho:
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+2498512
      
      Committer testing:
      
        # uname -a
        Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
      
      Before:
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+3131557
        p:probe/vfs_read_1 _text+3130975
        p:probe/vfs_read_2 _text+3131047
        p:probe/vfs_read_3 _text+3130380
        p:probe/vfs_read_4 _text+3130000
        # uname -a
        Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
        #
      
      After:
      
        # perf probe -D vfs_read
        p:probe/vfs_read _text+3130000
        #
      
      Fixes: db0d2c64 ("perf probe: Search concrete out-of-line instances")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241937063.32002.11024544873990816590.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      da6cb952
    • M
      perf probe: Skip end-of-sequence and non statement lines · f4d99bdf
      Masami Hiramatsu 提交于
      Skip end-of-sequence and non-statement lines while walking through lines
      list.
      
      The "end-of-sequence" line information means:
      
       "the current address is that of the first byte after the
        end of a sequence of target machine instructions."
       (DWARF version 4 spec 6.2.2)
      
      This actually means out of scope and we can not probe on it.
      
      On the other hand, the statement lines (is_stmt) means:
      
       "the current instruction is a recommended breakpoint location.
        A recommended breakpoint location is intended to “represent”
        a line, a statement and/or a semantically distinct subpart
        of a statement."
      
       (DWARF version 4 spec 6.2.2)
      
      So, non-statement line info also should be skipped.
      
      These can reduce unneeded probe points and also avoid an error.
      
      E.g. without this patch:
      
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new events:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1
      
        #
      
      This puts 5 probes on one line, but acutally it's not inlined function.
      This is because there are many non statement instructions at the
      function prologue.
      
      With this patch:
      
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new event:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
      
        #
      
      Now perf-probe skips unneeded addresses.
      
      Committer testing:
      
      Slightly different results, but similar:
      
      Before:
      
        # uname -a
        Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
        #
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new events:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
          probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1
      
        #
      
      After:
      
        # perf probe -a "clear_tasks_mm_cpumask:1"
        Added new event:
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
      
        # perf probe -l
          probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
        #
      
      Fixes: 4cc9cec6 ("perf probe: Introduce lines walker interface")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/157241936090.32002.12156347518596111660.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f4d99bdf
    • M
      perf probe: Return a better scope DIE if there is no best scope · c701636a
      Masami Hiramatsu 提交于
      Make find_best_scope() returns innermost DIE at given address if there
      is no best matched scope DIE. Since Gcc sometimes generates intuitively
      strange line info which is out of inlined function address range, we
      need this fixup.
      
      Without this, sometimes perf probe failed to probe on a line inside an
      inlined function:
      
        # perf probe -D ksys_open:3
        Failed to find scope of probe point.
          Error: Failed to add events.
      
      With this fix, 'perf probe' can probe it:
      
        # perf probe -D ksys_open:3
        p:probe/ksys_open _text+25707308
        p:probe/ksys_open_1 _text+25710596
        p:probe/ksys_open_2 _text+25711114
        p:probe/ksys_open_3 _text+25711343
        p:probe/ksys_open_4 _text+25714058
        p:probe/ksys_open_5 _text+2819653
        p:probe/ksys_open_6 _text+2819701
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c701636a
    • I
      perf annotate: Fix heap overflow · 5c65b1c0
      Ian Rogers 提交于
      Fix expand_tabs that copies the source lines '\0' and then appends
      another '\0' at a potentially out of bounds address.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20191026035644.217548-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5c65b1c0
    • A
      perf machine: Add kernel_dso() method · 93730f85
      Arnaldo Carvalho de Melo 提交于
      To reduce boilerplate in some places.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-9s1bgoxxhlnu037e1nqx0tw3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      93730f85
    • A
      perf symbols: Remove needless checks for map->groups->machine · b0c76fc4
      Arnaldo Carvalho de Melo 提交于
      Its sufficient to check if map->groups is NULL before using it to get
      ->machine value.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-utiepyiv8b1tf8f79ok9d6j8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b0c76fc4
    • I
      perf parse: Add a deep delete for parse event terms · 1dc92556
      Ian Rogers 提交于
      Add a parse_events_term deep delete function so that owned strings and
      arrays are freed.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-10-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1dc92556
    • I
      perf parse: If pmu configuration fails free terms · 38f2c422
      Ian Rogers 提交于
      Avoid a memory leak when the configuration fails.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-9-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      38f2c422
    • I
      perf parse: Before yyabort-ing free components · cabbf268
      Ian Rogers 提交于
      Yyabort doesn't destruct inputs and so this must be done manually before
      using yyabort.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-8-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cabbf268
    • I
      perf parse: Add destructors for parse event terms · f2a8ecd8
      Ian Rogers 提交于
      If parsing fails then destructors are ran to clean the up the stack.
      Rename the head union member to make the term and evlist use cases more
      distinct, this simplifies matching the correct destructor.
      
      Committer notes:
      
      Jiri: "Nice did not know about this.. looks like it's been in bison for some time, right?"
      
      Ian:  "Looks like it wasn't in Bison 1 but in Bison 2, we're at Bison 3 and
             Bison 2 is > 14 years old:
             https://web.archive.org/web/20050924004158/http://www.gnu.org/software/bison/manual/html_mono/bison.html#Destructor-Decl"
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-7-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f2a8ecd8
    • J
      selftests/tls: add test for concurrent recv and send · 41098af5
      Jakub Kicinski 提交于
      Add a test which spawns 16 threads and performs concurrent
      send and recv calls on the same socket.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41098af5
    • I
      perf parse: Ensure config and str in terms are unique · b6645a72
      Ian Rogers 提交于
      Make it easier to release memory associated with parse event terms by
      duplicating the string for the config name and ensuring the val string
      is a duplicate.
      
      Currently the parser may memory leak terms and this is addressed in a
      later patch.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b6645a72
    • I
      perf parse: Add parse events handle error · 448d732c
      Ian Rogers 提交于
      Parse event error handling may overwrite one error string with another
      creating memory leaks. Introduce a helper routine that warns about
      multiple error messages as well as avoiding the memory leak.
      
      A reproduction of this problem can be seen with:
      
        perf stat -e c/c/
      
      After this change this produces:
      WARNING: multiple event parsing errors
      event syntax error: 'c/c/'
                             \___ unknown term
      
      valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore
      Run 'perf list' for a list of valid events
      
       Usage: perf stat [<options>] [<command>]
      
          -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191030223448.12930-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      448d732c
    • A
      perf inject: Make --strip keep evsels · ef5502a1
      Adrian Hunter 提交于
      create_gcov (refer to the autofdo example in tools/perf/Documentation/intel-pt.txt)
      now needs the evsels to read the perf.data file. So don't strip them.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lore.kernel.org/lkml/20191105100057.21465-1-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef5502a1
    • J
      perf tools: Fix cross compile for ARM64 · 71f69907
      John Garry 提交于
      Currently when cross compiling perf tool for ARM64 on my x86 machine I
      get this error:
      
        arch/arm64/util/sym-handling.c:9:10: fatal error: gelf.h: No such file or directory
         #include <gelf.h>
      
      For the build, libelf is reported off:
      
        Auto-detecting system features:
        ...
        ...                        libelf: [ OFF ]
      
      Indeed, test-libelf is not built successfully:
      
        more ./build/feature/test-libelf.make.output
        test-libelf.c:2:10: fatal error: libelf.h: No such file or directory
         #include <libelf.h>
                ^~~~~~~~~~
        compilation terminated.
      
      I have no such problems natively compiling on ARM64, and I did not
      previously have this issue for cross compiling. Fix by relocating the
      gelf.h include.
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/1573045254-39833-1-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      71f69907
    • J
      perf stat: Add --per-node agregation support · 86895b48
      Jiri Olsa 提交于
      Adding new --per-node option to aggregate counts per NUMA
      nodes for system-wide mode measurements.
      
      You can specify --per-node in live mode:
      
        # perf stat  -a -I 1000 -e cycles --per-node
        #           time node   cpus             counts unit events
             1.000542550 N0       20          6,202,097      cycles
             1.000542550 N1       20            639,559      cycles
             2.002040063 N0       20          7,412,495      cycles
             2.002040063 N1       20          2,185,577      cycles
             3.003451699 N0       20          6,508,917      cycles
             3.003451699 N1       20            765,607      cycles
        ...
      
      Or in the record/report stat session:
      
        # perf stat record -a -I 1000 -e cycles
        #           time             counts unit events
             1.000536937         10,008,468      cycles
             2.002090152          9,578,539      cycles
             3.003625233          7,647,869      cycles
             4.005135036          7,032,086      cycles
        ^C     4.340902364          3,923,893      cycles
      
        # perf stat report --per-node
        #           time node   cpus             counts unit events
             1.000536937 N0       20          9,355,086      cycles
             1.000536937 N1       20            653,382      cycles
             2.002090152 N0       20          7,712,838      cycles
             2.002090152 N1       20          1,865,701      cycles
             3.003625233 N0       20          6,604,441      cycles
             3.003625233 N1       20          1,043,428      cycles
             4.005135036 N0       20          6,350,522      cycles
             4.005135036 N1       20            681,564      cycles
             4.340902364 N0       20          3,403,188      cycles
             4.340902364 N1       20            520,705      cycles
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86895b48
    • J
      perf env: Add perf_env__numa_node() · 389799a7
      Jiri Olsa 提交于
      To speed up cpu to node lookup, add perf_env__numa_node(), that creates
      cpu array on the first lookup, that holds numa nodes for each stored
      cpu.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20190904073415.723-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      389799a7
    • H
      perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6. · 61ec07f5
      Haiyan Song 提交于
      New Metrics:
      
      - DSB_Switches: fraction of cycles CPU was stalled due to switches from DSB to MITE pipeline [all]
      - L2_Evictions_{Silent|NonSilent}_PKI: L2 {silent|non silent} ecivtions rate per Kilo instruction [SKX+]
      - IpFarBranch - Instructions per Far Branch
      
      Other Enhancements & fixes:
      
      - KBLR/CFL & CLX move to separate columns (no column sharing via if #model)
      - Re-organized/renamed Metric Group
      Signed-off-by: NHaiyan Song <haiyanx.song@intel.com>
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Link: http://lore.kernel.org/lkml/20191030082308.10919-1-haiyanx.song@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      61ec07f5
    • H
      perf vendor events intel: Update CascadelakeX events to v1.05 · 7fcf1b89
      Haiyan Song 提交于
      Update CascadelakeX events to v1.05.
      
      Other changes:
      
       remove duplicated and without description events.
      Signed-off-by: NHaiyan Song <haiyanx.song@intel.com>
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Link: http://lore.kernel.org/lkml/20191030082308.10919-1-haiyanx.song@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fcf1b89
新手
引导
客服 返回
顶部