1. 26 3月, 2021 3 次提交
  2. 25 3月, 2021 2 次提交
  3. 24 3月, 2021 6 次提交
    • J
      perf test: Add CSV summary test · 0f7ff383
      Jin Yao 提交于
      The patch "perf stat: Align CSV output for summary mode" aligned CSV
      output and added "summary" to the first column of summary lines.
      
      Now we check if the "summary" string is added to the CSV output.
      
      If we set '--no-csv-summary' option, the "summary" string would not be
      added, also check with this case.
      
      Committer testing:
      
        $ perf test csv
        84: perf stat csv summary test     : Ok
        $
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210319070156.20394-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f7ff383
    • J
      perf stat: Align CSV output for summary mode · 0bdad978
      Jin Yao 提交于
      The 'perf stat' subcommand supports the request for a summary of the
      interval counter readings.  But the summary lines break the CSV output
      so it's hard for scripts to parse the result.
      
      Before:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
             1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
             1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
             1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
             1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
             1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
             1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
             1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
        8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
        270,,context-switches,8013513297,100.00,0.034,K/sec
        13,,cpu-migrations,8013530032,100.00,0.002,K/sec
        184,,page-faults,8013546992,100.00,0.023,K/sec
        20574191,,cycles,8013551506,100.00,0.003,GHz
        10562267,,instructions,8013564958,100.00,0.51,insn per cycle
        2019244,,branches,8013575673,100.00,0.252,M/sec
        106152,,branch-misses,8013585776,100.00,5.26,of all branches
      
      The summary line loses the timestamp column, which breaks the CSV
      output.
      
      We add a column at the original 'timestamp' position and it just says
      'summary' for the summary line.
      
      After:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
             1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
             1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
             1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
             1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
             1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
             1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
             1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
                 summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
                 summary,218,,context-switches,8012753271,100.00,0.027,K/sec
                 summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
                 summary,0,,page-faults,8012786257,100.00,0.000,K/sec
                 summary,15004518,,cycles,8012790637,100.00,0.002,GHz
                 summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
                 summary,1590259,,branches,8012814766,100.00,0.198,M/sec
                 summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
      
      Now it's easy for script to analyse the summary lines.
      
      Of course, we also consider not to break possible existing scripts which
      can continue to use the broken CSV format by using a new '--no-csv-summary.'
      option.
      
        # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
             1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
             1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
             1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
             1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
             1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
             1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
             1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
             1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
        8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
        197,,context-switches,8012703742,100.00,24.586,/sec
        9,,cpu-migrations,8012720902,100.00,1.123,/sec
        644,,page-faults,8012738266,100.00,80.373,/sec
        18350698,,cycles,8012744109,100.00,0.002,GHz
        12745021,,instructions,8012759001,100.00,0.69,insn per cycle
        2458033,,branches,8012770864,100.00,306.768,K/sec
        102107,,branch-misses,8012781751,100.00,4.15,of all branches
      
      This option can be enabled in perf config by setting the variable
      'stat.no-csv-summary'.
      
        # perf config stat.no-csv-summary=true
      
        # perf config -l
        stat.no-csv-summary=true
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
             1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
             1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
             1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
             1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
             1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
             1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
             1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
        8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
        205,,context-switches,8013308394,100.00,25.583,/sec
        10,,cpu-migrations,8013324681,100.00,1.248,/sec
        0,,page-faults,8013340926,100.00,0.000,/sec
        8027742,,cycles,8013344503,100.00,0.001,GHz
        2871717,,instructions,8013356501,100.00,0.36,insn per cycle
        553564,,branches,8013366204,100.00,69.081,K/sec
        54021,,branch-misses,8013375952,100.00,9.76,of all branches
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bdad978
    • S
      perf test: Add a shell test for 'perf stat --bpf-counters' new option · 2c0cb9f5
      Song Liu 提交于
      Add a test to compare the output of perf-stat with and without option
      --bpf-counters. If the difference is more than 10%, the test is considered
      as failed.
      
      Committer testing:
      
        # perf test bpf-counters
        86: perf stat --bpf-counters test                                   : Ok
        # perf test -v bpf-counters
        86: perf stat --bpf-counters test                                   :
        --- start ---
        test child forked, pid 2433339
        test child finished with 0
        ---- end ----
        perf stat --bpf-counters test: Ok
        #
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Requested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/EC00E37D-8587-4662-8E30-7AD5F874FA84@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2c0cb9f5
    • S
      perf stat: Measure 't0' and 'ref_time' after enable_counters() · 435b46ef
      Song Liu 提交于
      Take measurements of 't0' and 'ref_time' after enable_counters(), so
      that they only measure the time consumed when the counters are enabled.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NAndi Kleen <andi@firstfloor.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20210316211837.910506-3-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      435b46ef
    • S
      perf stat: Introduce 'bperf' to share hardware PMCs with BPF · 7fac83aa
      Song Liu 提交于
      The perf tool uses performance monitoring counters (PMCs) to monitor
      system performance. The PMCs are limited hardware resources. For
      example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
      
      Modern data center systems use these PMCs in many different ways: system
      level monitoring, (maybe nested) container level monitoring, per process
      monitoring, profiling (in sample mode), etc. In some cases, there are
      more active perf_events than available hardware PMCs. To allow all
      perf_events to have a chance to run, it is necessary to do expensive
      time multiplexing of events.
      
      On the other hand, many monitoring tools count the common metrics
      (cycles, instructions). It is a waste to have multiple tools create
      multiple perf_events of "cycles" and occupy multiple PMCs.
      
      bperf tries to reduce such wastes by allowing multiple perf_events of
      "cycles" or "instructions" (at different scopes) to share PMUs. Instead
      of having each perf-stat session to read its own perf_events, bperf uses
      BPF programs to read the perf_events and aggregate readings to BPF maps.
      Then, the perf-stat session(s) reads the values from these BPF maps.
      
      Please refer to the comment before the definition of bperf_ops for the
      description of bperf architecture.
      
      bperf is off by default. To enable it, pass --bpf-counters option to
      perf-stat. bperf uses a BPF hashmap to share information about BPF
      programs and maps used by bperf. This map is pinned to bpffs. The
      default path is /sys/fs/bpf/perf_attr_map. The user could change the
      path with option --bpf-attr-map.
      
      Committer testing:
      
        # dmesg|grep "Performance Events" -A5
        [    0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
        [    0.225280] ... version:                0
        [    0.225280] ... bit width:              48
        [    0.225281] ... generic registers:      6
        [    0.225281] ... value mask:             0000ffffffffffff
        [    0.225281] ... max period:             00007fffffffffff
        #
        #  for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done
        [1] 2436231
        [2] 2436232
        [3] 2436233
        [4] 2436234
        [5] 2436235
        [6] 2436236
        # perf stat -a -e cycles,instructions sleep 0.1
      
         Performance counter stats for 'system wide':
      
               310,326,987      cycles                                                        (41.87%)
               236,143,290      instructions              #    0.76  insn per cycle           (41.87%)
      
               0.100800885 seconds time elapsed
      
        #
      
      We can see that the counters were enabled for this workload 41.87% of
      the time.
      
      Now with --bpf-counters:
      
        #  for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done
        [1] 2436514
        [2] 2436515
        [3] 2436516
        [4] 2436517
        [5] 2436518
        [6] 2436519
        [7] 2436520
        [8] 2436521
        [9] 2436522
        [10] 2436523
        [11] 2436524
        [12] 2436525
        [13] 2436526
        [14] 2436527
        [15] 2436528
        [16] 2436529
        [17] 2436530
        [18] 2436531
        [19] 2436532
        [20] 2436533
        [21] 2436534
        [22] 2436535
        [23] 2436536
        [24] 2436537
        [25] 2436538
        [26] 2436539
        [27] 2436540
        [28] 2436541
        [29] 2436542
        [30] 2436543
        [31] 2436544
        [32] 2436545
        #
        # ls -la /sys/fs/bpf/perf_attr_map
        -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map
        # bpftool map | grep bperf | wc -l
        64
        #
      
        # bpftool map | tail
        1265: percpu_array  name accum_readings  flags 0x0
        	key 4B  value 24B  max_entries 1  memlock 4096B
        1266: hash  name filter  flags 0x0
        	key 4B  value 4B  max_entries 1  memlock 4096B
        1267: array  name bperf_fo.bss  flags 0x400
        	key 4B  value 8B  max_entries 1  memlock 4096B
        	btf_id 996
        	pids perf(2436545)
        1268: percpu_array  name accum_readings  flags 0x0
        	key 4B  value 24B  max_entries 1  memlock 4096B
        1269: hash  name filter  flags 0x0
        	key 4B  value 4B  max_entries 1  memlock 4096B
        1270: array  name bperf_fo.bss  flags 0x400
        	key 4B  value 8B  max_entries 1  memlock 4096B
        	btf_id 997
        	pids perf(2436541)
        1285: array  name pid_iter.rodata  flags 0x480
        	key 4B  value 4B  max_entries 1  memlock 4096B
        	btf_id 1017  frozen
        	pids bpftool(2437504)
        1286: array  flags 0x0
        	key 4B  value 32B  max_entries 1  memlock 4096B
        #
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        8f f3 bc ca 00 00 00 00  80 fd 2a d1 4d 00 00 00
        80 fd 2a d1 4d 00 00 00
        value (CPU 22):
        7e d5 64 4d 00 00 00 00  a4 8a 2e ee 4d 00 00 00
        a4 8a 2e ee 4d 00 00 00
        value (CPU 23):
        a7 78 3e 06 01 00 00 00  b2 34 94 f6 4d 00 00 00
        b2 34 94 f6 4d 00 00 00
        Found 1 element
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        c6 8b d9 ca 00 00 00 00  20 c6 fc 83 4e 00 00 00
        20 c6 fc 83 4e 00 00 00
        value (CPU 22):
        9c b4 d2 4d 00 00 00 00  3e 0c df 89 4e 00 00 00
        3e 0c df 89 4e 00 00 00
        value (CPU 23):
        18 43 66 06 01 00 00 00  5b 69 ed 83 4e 00 00 00
        5b 69 ed 83 4e 00 00 00
        Found 1 element
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        f2 6e db ca 00 00 00 00  92 67 4c ba 4e 00 00 00
        92 67 4c ba 4e 00 00 00
        value (CPU 22):
        dc 8e e1 4d 00 00 00 00  d9 32 7a c5 4e 00 00 00
        d9 32 7a c5 4e 00 00 00
        value (CPU 23):
        bd 2b 73 06 01 00 00 00  7c 73 87 bf 4e 00 00 00
        7c 73 87 bf 4e 00 00 00
        Found 1 element
        #
      
        # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1
      
         Performance counter stats for 'system wide':
      
             119,410,122      cycles
             152,105,479      instructions              #    1.27  insn per cycle
      
             0.101395093 seconds time elapsed
      
        #
      
      See? We had the counters enabled all the time.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fac83aa
    • I
      perf tools: Fix various typos in comments · 4d39c89f
      Ingo Molnar 提交于
      Fix ~124 single-word typos and a few spelling errors in the perf tooling code,
      accumulated over the years.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com
      Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4d39c89f
  4. 17 3月, 2021 3 次提交
  5. 15 3月, 2021 5 次提交
    • C
      perf stat: Improve readability of shadow stats · 6859bc0e
      Changbin Du 提交于
      This adds function convert_unit_double() and selects appropriate
      unit for shadow stats between K/M/G.
      
        $ sudo perf stat -a -- sleep 1
      
      Before: Unit 'M' is selected even the number is very small.
      
       Performance counter stats for 'system wide':
      
                4,003.06 msec cpu-clock                 #    3.998 CPUs utilized
                  16,179      context-switches          #    0.004 M/sec
                     161      cpu-migrations            #    0.040 K/sec
                   4,699      page-faults               #    0.001 M/sec
           6,135,801,925      cycles                    #    1.533 GHz                      (83.21%)
           5,783,308,491      stalled-cycles-frontend   #   94.26% frontend cycles idle     (83.21%)
           4,543,694,050      stalled-cycles-backend    #   74.05% backend cycles idle      (66.49%)
           4,720,130,587      instructions              #    0.77  insn per cycle
                                                        #    1.23  stalled cycles per insn  (83.28%)
             753,848,078      branches                  #  188.318 M/sec                    (83.61%)
              37,457,747      branch-misses             #    4.97% of all branches          (83.48%)
      
             1.001283725 seconds time elapsed
      
      After:
      
      $ sudo perf stat -a -- sleep 2
      
       Performance counter stats for 'system wide':
      
                8,005.52 msec cpu-clock                 #    3.999 CPUs utilized
                  10,715      context-switches          #    1.338 K/sec
                     785      cpu-migrations            #   98.057 /sec
                     102      page-faults               #   12.741 /sec
           1,948,202,279      cycles                    #    0.243 GHz
           2,816,470,932      stalled-cycles-frontend   #  144.57% frontend cycles idle
           2,661,172,207      stalled-cycles-backend    #  136.60% backend cycles idle
             464,172,105      instructions              #    0.24  insn per cycle
                                                        #    6.07  stalled cycles per insn
              91,567,662      branches                  #   11.438 M/sec
               7,756,054      branch-misses             #    8.47% of all branches
      
             2.002040043 seconds time elapsed
      
      v2:
        o do not change 'sec' to 'cpu-sec'.
        o use convert_unit_double to implement convert_unit.
      Signed-off-by: NChangbin Du <changbin.du@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210315143047.3867-1-changbin.du@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6859bc0e
    • A
      perf stat: Elaborate use cases for the -n/--null command line option · 4a03af3e
      Arnaldo Carvalho de Melo 提交于
      The existing text was way too terse, pick the intended usage from the
      cset that introduced this option.
      
      Twitter: https://twitter.com/_monoid/status/1371461130175004672?s=20Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4a03af3e
    • S
      perf vendor events arm64: Add Fujitsu A64FX pmu event · 5497b23e
      Shunsuke Nakamura 提交于
      Add pmu events for A64FX.
      
      Documentation source:
      
        https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.2.pdfSigned-off-by: NNakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Reviewed-by: NJohn Garry <john.garry@huawei.com>
      Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210308105342.746940-3-nakamura.shun@fujitsu.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5497b23e
    • S
      perf vendor events arm64: Add more common and uarch events · 8efd1634
      Shunsuke Nakamura 提交于
      Add the following events.[1]
      
      Common architectural events:
        - L2I_TLB_REFILL
        - L2I_TLB
        - SIMD_INST_RETIRED
        - SVE_INST_RETIRED
      
      Common microarchitectural events:
        - UOP_SPEC
        - SVE_MATH_SPEC
        - FP_SPEC
        - FP_FMA_SPEC
        - FP_RECPE_SPEC
        - FP_CVT_SPEC
        - ASE_SVE_INT_SPEC
        - SVE_PRED_SPEC
        - SVE_MOVPRFX_SPEC
        - SVE_MOVPRFX_U_SPEC
        - ASE_SVE_LD_SPEC
        - ASE_SVE_ST_SPEC
        - PRF_SPEC
        - BASE_LD_REG_SPEC
        - BASE_ST_REG_SPEC
        - SVE_LDR_REG_SPEC
        - SVE_STR_REG_SPEC
        - SVE_LDR_PREG_SPEC
        - SVE_STR_PREG_SPEC
        - SVE_PRF_CONTIG_SPEC
        - ASE_SVE_LD_MULTI_SPEC
        - ASE_SVE_ST_MULTI_SPEC
        - SVE_LD_GATHER_SPEC
        - SVE_ST_SCATTER_SPEC
        - SVE_PRF_GATHER_SPEC
        - SVE_LDFF_SPEC
        - FP_SCALE_OPS_SPEC
        - FP_FIXED_OPS_SPEC
        - FP_HP_SCALE_OPS_SPEC
        - FP_HP_FIXED_OPS_SPEC
        - FP_SP_SCALE_OPS_SPEC
        - FP_SP_FIXED_OPS_SPEC
        - FP_DP_SCALE_OPS_SPEC
        - FP_DP_FIXED_OPS_SPEC
      
      Reference document is at the following:
      
        [1] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.2.pdfSigned-off-by: NNakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Reviewed-by: NJohn Garry <john.garry@huawei.com>
      Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210308105342.746940-2-nakamura.shun@fujitsu.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8efd1634
    • A
      perf evlist: Change the COMM when preparing the workload · a7672d1d
      Arnaldo Carvalho de Melo 提交于
      It was reported that --exclude-perf wasn't working, as tracepoints were
      appearing in 'perf script' output as having the 'perf' COMM, that is
      just the window in evlist__prepare_workload() after the fork() and
      before the execvp() call for workloads specified in the command line.
      
      Example:
      
        # perf record -e kmem:kmalloc --filter 'bytes_alloc<650 && bytes_alloc>620' --exclude-perf -e kmem:kfree --exclude-perf -aR sleep 30
      
      Then:
      
        # perf script
                perf 15905 [009] 1498.356094: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [009] 1498.356116: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
                perf 15905 [009] 1498.356116: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00
                perf 15905 [009] 1498.356138: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [009] 1498.356148: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
                perf 15905 [009] 1498.356148: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00
                perf 15905 [009] 1498.356168: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [009] 1498.356176: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
        <SNIP>
                perf 15905 [009] 1498.356348: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [014] 1498.356386: kmem:kfree: call_site=security_compute_sid.part.0+0x3b2 ptr=(nil)
                perf 15905 [014] 1498.356423: kmem:kfree: call_site=load_elf_binary+0x207 ptr=0xffff9cf5b2a34220
                perf 15905 [014] 1498.356694: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf6d0b3b000
               sleep 15905 [014] 1498.356739: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
      
      Use prctl() to show that that is just the preparation of the workload:
      
        # perf script
           perf-exec 19036 [009] 2199.357582: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
           perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
           perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf786459800
           perf-exec 19036 [009] 2199.357630: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
        <SNIP>
           perf-exec 19036 [000] 2199.358277: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786fb9c00
           perf-exec 19036 [000] 2199.358278: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458200
           perf-exec 19036 [000] 2199.358279: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458600
               sleep 19036 [000] 2199.358316: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
               sleep 19036 [000] 2199.358323: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
               sleep 19036 [000] 2199.358330: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
               sleep 19036 [000] 2199.358337: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
               sleep 19036 [000] 2199.358339: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
               sleep 19036 [000] 2199.358341: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
      
      Reporter: zhanweiw <wingfancy@hotmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212213Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a7672d1d
  6. 09 3月, 2021 4 次提交
  7. 08 3月, 2021 2 次提交
    • A
      perf symbols: Fix dso__fprintf_symbols_by_name() to return the number of printed chars · 210e4c89
      Arnaldo Carvalho de Melo 提交于
      The 'ret' variable was initialized to zero but then it was not updated
      from the fprintf() return, fix it.
      Reported-by: NYang Li <yang.lee@linux.alibaba.com>
      cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      cc: Ingo Molnar <mingo@redhat.com>
      cc: Jiri Olsa <jolsa@redhat.com>
      cc: Mark Rutland <mark.rutland@arm.com>
      cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Fixes: 90f18e63 ("perf symbols: List symbols in a dso in ascending name order")
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      210e4c89
    • I
      tools include: Add __sum16 and __wsum definitions. · 2942a671
      Ian Rogers 提交于
      This adds definitions available in the uapi version.
      
      Explanation:
      
      In the kernel include of types.h the uapi version is included.
      In tools the uapi/linux/types.h and linux/types.h are distinct.
      For BPF programs a definition of __wsum is needed by the generated
      bpf_helpers.h. The definition comes either from a generated vmlinux.h or
      from <linux/types.h> that may be transitively included from bpf.h. The
      perf build prefers linux/types.h over uapi/linux/types.h for
      <linux/types.h>*. To allow tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
      to compile with the same include path used for perf then these
      definitions are necessary.
      
      There is likely a wider conversation about exactly how types.h should be
      specified and the include order used by the perf build - it is somewhat
      confusing that tools/include/uapi/linux/bpf.h is using the non-uapi
      types.h.
      
      *see tools/perf/Makefile.config:
      ...
      INC_FLAGS += -I$(srctree)/tools/include/
      INC_FLAGS += -I$(srctree)/tools/arch/$(SRCARCH)/include/uapi
      INC_FLAGS += -I$(srctree)/tools/include/uapi
      ...
      The include directories are scanned from left-to-right:
      https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html
      As tools/include/linux/types.h appears before
      tools/include/uapi/linux/types.h then I say it is preferred.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210307223024.4081067-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2942a671
  8. 07 3月, 2021 15 次提交
    • S
      perf cs-etm: Fix bitmap for option · 6fc5baf5
      Suzuki K Poulose 提交于
      When set option with macros ETM_OPT_CTXTID and ETM_OPT_TS, it wrongly
      takes these two values (14 and 28 prespectively) as bit masks, but
      actually both are the offset for bits.  But this doesn't lead to
      further failure due to the AND logic operation will be always true for
      ETM_OPT_CTXTID / ETM_OPT_TS.
      
      This patch defines new independent macros (rather than using the
      "config" bits) for requesting the "contextid" and "timestamp" for
      cs_etm_set_option().
      Signed-off-by: NSuzuki Poulouse <suzuki.poulose@arm.com>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Daniel Kiss <daniel.kiss@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210206150833.42120-5-leo.yan@linaro.org
      [ Extract the change as a separate patch for easier review ]
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6fc5baf5
    • M
      perf trace: Fix race in signal handling · 86a19008
      Michael Petlan 提交于
      Since a lot of stuff happens before the SIGINT signal handler is registered
      (scanning /proc/*, etc.), on bigger systems, such as Cavium Sabre CN99xx,
      it may happen that first interrupt signal is lost and perf isn't correctly
      terminated.
      
      The reproduction code might look like the following:
      
          perf trace -a &
          PERF_PID=$!
          sleep 4
          kill -INT $PERF_PID
      
      The issue has been found on a CN99xx machine with RHEL-8 and the patch fixes
      it by registering the signal handlers earlier in the init stage.
      Suggested-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NMichael Petlan <mpetlan@redhat.com>
      Tested-by: NMichael Petlan <mpetlan@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/lkml/YEJnaMzH2ctp3PPx@kernel.org/Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86a19008
    • A
      perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches · 77d02bd0
      Arnaldo Carvalho de Melo 提交于
      Noticed on a debian:experimental mips and mipsel cross build build
      environment:
      
        perfbuilder@ec265a086e9b:~$ mips-linux-gnu-gcc --version | head -1
        mips-linux-gnu-gcc (Debian 10.2.1-3) 10.2.1 20201224
        perfbuilder@ec265a086e9b:~$
      
          CC       /tmp/build/perf/util/map.o
        util/map.c: In function 'map__new':
        util/map.c:109:5: error: '%s' directive output may be truncated writing between 1 and 2147483645 bytes into a region of size 4096 [-Werror=format-truncation=]
          109 |    "%s/platforms/%s/arch-%s/usr/lib/%s",
              |     ^~
        In file included from /usr/mips-linux-gnu/include/stdio.h:867,
                         from util/symbol.h:11,
                         from util/map.c:2:
        /usr/mips-linux-gnu/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output 32 or more bytes (assuming 4294967321) into a destination of size 4096
           67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           68 |        __bos (__s), __fmt, __va_arg_pack ());
              |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      
      Since we have the lenghts for what lands in that place, use it to give
      the compiler more info and make it happy.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      77d02bd0
    • R
      perf report: Fix -F for branch & mem modes · 6740a4e7
      Ravi Bangoria 提交于
      perf report fails to add valid additional fields with -F when
      used with branch or mem modes. Fix it.
      
      Before patch:
      
        $ perf record -b
        $ perf report -b -F +srcline_from --stdio
        Error:
        Invalid --fields key: `srcline_from'
      
      After patch:
      
        $ perf report -b -F +srcline_from --stdio
        # Samples: 8K of event 'cycles'
        # Event count (approx.): 8784
        ...
      
      Committer notes:
      
      There was an inversion: when looking at branch stack dimensions (keys)
      it was checking if the sort mode was 'mem', not 'branch'.
      
      Fixes: aa6b3c99 ("perf report: Make -F more strict like -s")
      Reported-by: NAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Reviewed-by: NAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20210304062958.85465-1-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6740a4e7
    • A
      perf tests x86: Move insn.h include to make sure it finds stddef.h · c1f272df
      Arnaldo Carvalho de Melo 提交于
      In some versions of alpine Linux the perf build is broken since commit
      1d509f2a ("x86/insn: Support big endian cross-compiles"):
      
        In file included from /usr/include/linux/byteorder/little_endian.h:13,
                         from /usr/include/asm/byteorder.h:5,
                         from arch/x86/util/../../../../arch/x86/include/asm/insn.h:10,
                         from arch/x86/util/archinsn.c:2:
        /usr/include/linux/swab.h:161:8: error: unknown type name '__always_inline'
         static __always_inline __u16 __swab16p(const __u16 *p)
      
      So move the inclusion of arch/x86/include/asm/insn.h to later in the
      places where linux/stddef.h (that conditionally defines
      __always_inline) to workaround this problem on Alpine Linux 3.9 to 3.11,
      3.12 onwards works.
      
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1f272df
    • K
      perf test: Support the ins_lat check in the X86 specific test · 7d9d4c6e
      Kan Liang 提交于
      The ins_lat of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction
      latency, which is only available for X86. Add a X86 specific test for
      the ins_lat and PERF_SAMPLE_WEIGHT_STRUCT type.
      
      The test__x86_sample_parsing() uses the same way as the
      test__sample_parsing() to verify a sample type. Since the ins_lat and
      PERF_SAMPLE_WEIGHT_STRUCT are the only X86 specific sample type for now,
      the test__x86_sample_parsing() only verify the PERF_SAMPLE_WEIGHT_STRUCT
      type. Other sample types are still verified in the generic test.
      
        $ perf test 77 -v
        77: x86 Sample parsing                                              :
        --- start ---
        test child forked, pid 102370
        test child finished with 0
        ---- end ----
        x86 Sample parsing: Ok
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1614787285-104151-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7d9d4c6e
    • K
      perf test: Fix sample-parsing failure on non-x86 platforms · a8146d66
      Kan Liang 提交于
      Executing 'perf test 27' fails on s390:
      
        [root@t35lp46 perf]# ./perf test -Fv 27
        27: Sample parsing
        --- start ---
        ---- end ----
        Sample parsing: FAILED!
        [root@t35lp46 perf]#
      
      The commit fbefe9c2 ("perf tools: Support arch specific
      PERF_SAMPLE_WEIGHT_STRUCT processing") changes the ins_lat to a
      model-specific variable only for X86, but perf test still verify the
      variable in the generic test.
      
      Remove the ins_lat check in the generic test. The following patch will
      add it in the X86 specific test.
      
      Fixes: fbefe9c2 ("perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing")
      Reported-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NThomas Richter <tmricht@linux.ibm.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1614787285-104151-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a8146d66
    • N
      perf archive: Fix filtering of empty build-ids · ec4d0a76
      Nicholas Fraser 提交于
      A non-existent build-id used to be treated as all-zero SHA-1 hash.
      Build-ids are now variable width. A non-existent build-id is an empty
      string and "perf buildid-list" pads this with spaces. This is true even
      when using old perf.data files recorded from older versions of perf;
      "perf buildid-list" never reports an all-zero hash anymore.
      
      This fixes "perf-archive" to skip missing build-ids by skipping lines
      that start with a padding space rather than with zeroes.
      Signed-off-by: NNicholas Fraser <nfraser@codeweavers.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Huw Davies <huw@codeweavers.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ulrich Czekalla <uczekalla@codeweavers.com>
      Link: https://lore.kernel.org/r/442bffc7-ac5c-0975-b876-a549efce2413@codeweavers.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec4d0a76
    • N
      perf daemon: Fix compile error with Asan · bd57a9f3
      Namhyung Kim 提交于
      I'm seeing a build failure when build with address sanitizer.  It seems
      we could write to the name[100] if the var is longer.
      
        $ make EXTRA_CFLAGS=-fsanitize=address
        ...
          CC       builtin-daemon.o
        In function ‘get_session_name’,
          inlined from ‘session_config’ at builtin-daemon.c:164:6,
          inlined from ‘server_config’ at builtin-daemon.c:223:10:
        builtin-daemon.c:155:11: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
          155 |  *session = 0;
              |  ~~~~~~~~~^~~
        builtin-daemon.c: In function ‘server_config’:
        builtin-daemon.c:162:7: note: at offset 100 to object ‘name’ with size 100 declared here
          162 |  char name[100];
              |       ^~~~
      
      Fixes: c0666261 ("perf daemon: Add config file support")
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210224071438.686677-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bd57a9f3
    • N
      perf stat: Fix use-after-free when -r option is used · 513068f2
      Namhyung Kim 提交于
      I got a segfault when using -r option with event groups.  The option
      makes it run the workload multiple times and it will reuse the evlist
      and evsel for each run.
      
      While most of resources are allocated and freed properly, the id hash
      in the evlist was not and it resulted in the bug.  You can see it with
      the address sanitizer like below:
      
        $ perf stat -r 100 -e '{cycles,instructions}' true
        =================================================================
        ==693052==ERROR: AddressSanitizer: heap-use-after-free on
            address 0x6080000003d0 at pc 0x558c57732835 bp 0x7fff1526adb0 sp 0x7fff1526ada8
        WRITE of size 8 at 0x6080000003d0 thread T0
          #0 0x558c57732834 in hlist_add_head /home/namhyung/project/linux/tools/include/linux/list.h:644
          #1 0x558c57732834 in perf_evlist__id_hash /home/namhyung/project/linux/tools/lib/perf/evlist.c:237
          #2 0x558c57732834 in perf_evlist__id_add /home/namhyung/project/linux/tools/lib/perf/evlist.c:244
          #3 0x558c57732834 in perf_evlist__id_add_fd /home/namhyung/project/linux/tools/lib/perf/evlist.c:285
          #4 0x558c5747733e in store_evsel_ids util/evsel.c:2765
          #5 0x558c5747733e in evsel__store_ids util/evsel.c:2782
          #6 0x558c5730b717 in __run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:895
          #7 0x558c5730b717 in run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:1014
          #8 0x558c5730b717 in cmd_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:2446
          #9 0x558c57427c24 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
          #10 0x558c572b1a48 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
          #11 0x558c572b1a48 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
          #12 0x558c572b1a48 in main /home/namhyung/project/linux/tools/perf/perf.c:539
          #13 0x7fcadb9f7d09 in __libc_start_main ../csu/libc-start.c:308
          #14 0x558c572b60f9 in _start (/home/namhyung/project/linux/tools/perf/perf+0x45d0f9)
      
      Actually the nodes in the hash table are struct perf_stream_id and
      they were freed in the previous run.  Fix it by resetting the hash.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210225035148.778569-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      513068f2
    • N
      libperf: Add perf_evlist__reset_id_hash() · e2a99c9a
      Namhyung Kim 提交于
      Add the perf_evlist__reset_id_hash() function as an internal function so
      that it can be called by perf to reset the hash table.  This is
      necessary for 'perf stat' to run the workload multiple times.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210225035148.778569-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e2a99c9a
    • J
      perf stat: Fix wrong skipping for per-die aggregation · 034f7ee1
      Jin Yao 提交于
      Uncore becomes die-scope on Xeon Cascade Lake-AP and perf has supported
      --per-die aggregation yet.
      
      One issue is found in check_per_pkg() for uncore events running on AP
      system. On cascade Lake-AP, we have:
      
      S0-D0
      S0-D1
      S1-D0
      S1-D1
      
      But in check_per_pkg(), S0-D1 and S1-D1 are skipped because the mask
      bits for S0 and S1 have been set for S0-D0 and S1-D0. It doesn't check
      die_id. So the counting for S0-D1 and S1-D1 are set to zero.  That's not
      correct.
      
        root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
           1.001460963 S0-D0           1            1317376 Bytes llc_misses.mem_read
           1.001460963 S0-D1           1             998016 Bytes llc_misses.mem_read
           1.001460963 S1-D0           1             970496 Bytes llc_misses.mem_read
           1.001460963 S1-D1           1            1291264 Bytes llc_misses.mem_read
           2.003488021 S0-D0           1            1082048 Bytes llc_misses.mem_read
           2.003488021 S0-D1           1            1919040 Bytes llc_misses.mem_read
           2.003488021 S1-D0           1             890752 Bytes llc_misses.mem_read
           2.003488021 S1-D1           1            2380800 Bytes llc_misses.mem_read
           3.005613270 S0-D0           1            1126080 Bytes llc_misses.mem_read
           3.005613270 S0-D1           1            2898176 Bytes llc_misses.mem_read
           3.005613270 S1-D0           1             870912 Bytes llc_misses.mem_read
           3.005613270 S1-D1           1            3388608 Bytes llc_misses.mem_read
           4.007627598 S0-D0           1            1124608 Bytes llc_misses.mem_read
           4.007627598 S0-D1           1            3884416 Bytes llc_misses.mem_read
           4.007627598 S1-D0           1             921088 Bytes llc_misses.mem_read
           4.007627598 S1-D1           1            4451840 Bytes llc_misses.mem_read
           5.001479927 S0-D0           1             963328 Bytes llc_misses.mem_read
           5.001479927 S0-D1           1            4831936 Bytes llc_misses.mem_read
           5.001479927 S1-D0           1             895104 Bytes llc_misses.mem_read
           5.001479927 S1-D1           1            5496640 Bytes llc_misses.mem_read
      
      From above output, we can see S0-D1 and S1-D1 don't report the interval
      values, they are continued to grow. That's because check_per_pkg()
      wrongly decides to use zero counts for S0-D1 and S1-D1.
      
      So in check_per_pkg(), we should use hashmap(socket,die) to decide if
      the cpu counts needs to skip. Only considering socket is not enough.
      
      Now with this patch,
      
        root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
           1.001586691 S0-D0           1            1229440 Bytes llc_misses.mem_read
           1.001586691 S0-D1           1             976832 Bytes llc_misses.mem_read
           1.001586691 S1-D0           1             938304 Bytes llc_misses.mem_read
           1.001586691 S1-D1           1            1227328 Bytes llc_misses.mem_read
           2.003776312 S0-D0           1            1586752 Bytes llc_misses.mem_read
           2.003776312 S0-D1           1             875392 Bytes llc_misses.mem_read
           2.003776312 S1-D0           1             855616 Bytes llc_misses.mem_read
           2.003776312 S1-D1           1             949376 Bytes llc_misses.mem_read
           3.006512788 S0-D0           1            1338880 Bytes llc_misses.mem_read
           3.006512788 S0-D1           1             920064 Bytes llc_misses.mem_read
           3.006512788 S1-D0           1             877184 Bytes llc_misses.mem_read
           3.006512788 S1-D1           1            1020736 Bytes llc_misses.mem_read
           4.008895291 S0-D0           1             926592 Bytes llc_misses.mem_read
           4.008895291 S0-D1           1             906368 Bytes llc_misses.mem_read
           4.008895291 S1-D0           1             892224 Bytes llc_misses.mem_read
           4.008895291 S1-D1           1             987712 Bytes llc_misses.mem_read
           5.001590993 S0-D0           1             962624 Bytes llc_misses.mem_read
           5.001590993 S0-D1           1             912512 Bytes llc_misses.mem_read
           5.001590993 S1-D0           1             891200 Bytes llc_misses.mem_read
           5.001590993 S1-D1           1             978432 Bytes llc_misses.mem_read
      
      On no-die system, die_id is 0, actually it's hashmap(socket,0), original behavior
      is not changed.
      Reported-by: NYing Huang <ying.huang@intel.com>
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ying Huang <ying.huang@intel.com>
      Link: http://lore.kernel.org/lkml/20210128013417.25597-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      034f7ee1
    • A
      tools headers UAPI: Sync KVM's kvm.h and vmx.h headers with the kernel sources · 33dc525f
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
      
      That makes 'perf kvm-stat' aware of this new BUS_LOCK exit reason, thus
      addressing the following perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
        diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
      
      Cc: Chenyi Qiang <chenyi.qiang@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      33dc525f
    • A
      tools headers cpufeatures: Sync with the kernel sources · 1a9bcadd
      Arnaldo Carvalho de Melo 提交于
      To pick the changes from:
      
        3b9c723e ("KVM: SVM: Add support for SVM instruction address check change")
        b85a0425 ("Enumerate AVX Vector Neural Network instructions")
        fb35d30f ("x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX]")
      
      This only causes these perf files to be rebuilt:
      
        CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
        CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
      
      And addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
      
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Kyung Min Park <kyung.min.park@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Wei Huang <wei.huang2@amd.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1a9bcadd
    • A
      tools headers UAPI: Update tools' copy of linux/coresight-pmu.h · 6c0afc57
      Arnaldo Carvalho de Melo 提交于
      To get the changes in these commits:
      
        88f11864 ("coresight: etm-perf: Support PID tracing for kernel at EL2")
        53abf3fe ("coresight: etm-perf: Clarify comment on perf options")
      
      This will possibly be used in patches lined up for v5.13.
      
      And silence this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/linux/coresight-pmu.h' differs from latest version at 'include/linux/coresight-pmu.h'
        diff -u tools/include/linux/coresight-pmu.h include/linux/coresight-pmu.h
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6c0afc57
新手
引导
客服 返回
顶部