1. 21 4月, 2022 1 次提交
  2. 13 1月, 2022 4 次提交
    • I
      perf cpumap: Give CPUs their own type · 6d18804b
      Ian Rogers 提交于
      A common problem is confusing CPU map indices with the CPU, by wrapping
      the CPU with a struct then this is avoided. This approach is similar to
      atomic_t.
      
      Committer notes:
      
      To make it build with BUILD_BPF_SKEL=1 these files needed the
      conversions to 'struct perf_cpu' usage:
      
        tools/perf/util/bpf_counter.c
        tools/perf/util/bpf_counter_cgroup.c
        tools/perf/util/bpf_ftrace.c
      
      Also perf_env__get_cpu() was removed back in "perf cpumap: Switch
      cpu_map__build_map to cpu function".
      
      Additionally these needed to be fixed for the ARM builds to complete:
      
        tools/perf/arch/arm/util/cs-etm.c
        tools/perf/arch/arm64/util/pmu.c
      Suggested-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d18804b
    • I
      perf stat: Swap variable name cpu to index · 5b1af93d
      Ian Rogers 提交于
      The use of CPU is error prone, switch to cpu_map_idx.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-43-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5b1af93d
    • I
      perf evsel: Rename variable cpu to index · 6f844b1f
      Ian Rogers 提交于
      Make naming less error prone.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-40-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f844b1f
    • I
      perf stat: Switch to cpu version of cpu_map__get() · 88031a0d
      Ian Rogers 提交于
      Avoid possible bugs where the wrong index is passed with the cpu_map.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      88031a0d
  3. 20 4月, 2021 1 次提交
  4. 24 3月, 2021 1 次提交
    • J
      perf stat: Align CSV output for summary mode · 0bdad978
      Jin Yao 提交于
      The 'perf stat' subcommand supports the request for a summary of the
      interval counter readings.  But the summary lines break the CSV output
      so it's hard for scripts to parse the result.
      
      Before:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
             1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
             1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
             1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
             1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
             1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
             1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
             1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
        8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
        270,,context-switches,8013513297,100.00,0.034,K/sec
        13,,cpu-migrations,8013530032,100.00,0.002,K/sec
        184,,page-faults,8013546992,100.00,0.023,K/sec
        20574191,,cycles,8013551506,100.00,0.003,GHz
        10562267,,instructions,8013564958,100.00,0.51,insn per cycle
        2019244,,branches,8013575673,100.00,0.252,M/sec
        106152,,branch-misses,8013585776,100.00,5.26,of all branches
      
      The summary line loses the timestamp column, which breaks the CSV
      output.
      
      We add a column at the original 'timestamp' position and it just says
      'summary' for the summary line.
      
      After:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
             1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
             1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
             1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
             1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
             1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
             1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
             1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
                 summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
                 summary,218,,context-switches,8012753271,100.00,0.027,K/sec
                 summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
                 summary,0,,page-faults,8012786257,100.00,0.000,K/sec
                 summary,15004518,,cycles,8012790637,100.00,0.002,GHz
                 summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
                 summary,1590259,,branches,8012814766,100.00,0.198,M/sec
                 summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
      
      Now it's easy for script to analyse the summary lines.
      
      Of course, we also consider not to break possible existing scripts which
      can continue to use the broken CSV format by using a new '--no-csv-summary.'
      option.
      
        # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
             1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
             1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
             1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
             1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
             1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
             1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
             1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
             1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
        8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
        197,,context-switches,8012703742,100.00,24.586,/sec
        9,,cpu-migrations,8012720902,100.00,1.123,/sec
        644,,page-faults,8012738266,100.00,80.373,/sec
        18350698,,cycles,8012744109,100.00,0.002,GHz
        12745021,,instructions,8012759001,100.00,0.69,insn per cycle
        2458033,,branches,8012770864,100.00,306.768,K/sec
        102107,,branch-misses,8012781751,100.00,4.15,of all branches
      
      This option can be enabled in perf config by setting the variable
      'stat.no-csv-summary'.
      
        # perf config stat.no-csv-summary=true
      
        # perf config -l
        stat.no-csv-summary=true
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
             1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
             1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
             1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
             1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
             1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
             1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
             1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
        8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
        205,,context-switches,8013308394,100.00,25.583,/sec
        10,,cpu-migrations,8013324681,100.00,1.248,/sec
        0,,page-faults,8013340926,100.00,0.000,/sec
        8027742,,cycles,8013344503,100.00,0.001,GHz
        2871717,,instructions,8013356501,100.00,0.36,insn per cycle
        553564,,branches,8013366204,100.00,69.081,K/sec
        54021,,branch-misses,8013375952,100.00,9.76,of all branches
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bdad978
  5. 09 3月, 2021 1 次提交
  6. 09 2月, 2021 1 次提交
    • K
      perf stat: Support L2 Topdown events · 63e39aa6
      Kan Liang 提交于
      The TMA method level 2 metrics is supported from the Intel Sapphire
      Rapids server, which expose four L2 Topdown metrics events to user
      space. There are eight L2 events in total. The other four L2 Topdown
      metrics events are calculated from the corresponding L1 and the exposed
      L2 events.
      
      Now, the --topdown prints the complete top-down metrics that supported
      by the CPU. For the Intel Sapphire Rapids server, there are 4 L1 events
      and 8 L2 events displyed in one line.
      
      Add a new option, --td-level, to display the top-down statistics that
      equal to or lower than the input level.
      
      The L2 event is marked only when both its L1 parent event and itself
      crosse the threshold.
      
      Here is an example:
      
        $ perf stat --topdown --td-level=2 --no-metric-only sleep 1
        Topdown accuracy may decrease when measuring long periods.
        Please print the result regularly, e.g. -I1000
      
        Performance counter stats for 'sleep 1':
      
           16,734,390   slots
            2,100,001   topdown-retiring       # 12.6% retiring
            2,034,376   topdown-bad-spec       # 12.3% bad speculation
            4,003,128   topdown-fe-bound       # 24.1% frontend bound
              328,125   topdown-heavy-ops      #  2.0% heavy operations    #  10.6% light operations
            1,968,751   topdown-br-mispredict  # 11.9% branch mispredict   #  0.4% machine clears
            2,953,127   topdown-fetch-lat      # 17.8% fetch latency       #  6.3% fetch bandwidth
            5,906,255   topdown-mem-bound      # 35.6% memory bound        #  15.4% core bound
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-9-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63e39aa6
  7. 24 12月, 2020 2 次提交
  8. 01 12月, 2020 1 次提交
  9. 30 11月, 2020 1 次提交
  10. 04 11月, 2020 1 次提交
  11. 28 9月, 2020 1 次提交
    • N
      perf stat: Add --for-each-cgroup option · d1c5a0e8
      Namhyung Kim 提交于
      The --for-each-cgroup option is a syntax sugar to monitor large number
      of cgroups easily.  Current command line requires to list all the events
      and cgroups even if users want to monitor same events for each cgroup.
      This patch addresses that usage by copying given events for each cgroup
      on user's behalf.
      
      For instance, if they want to monitor 6 events for 200 cgroups each they
      should write 1200 event names (with -e) AND 1200 cgroup names (with -G)
      on the command line.  But with this change, they can just specify 6
      events and 200 cgroups with a new option.
      
      A simpler example below: It wants to measure 3 events for 2 cgroups ('A'
      and 'B').  The result is that total 6 events are counted like below.
      
        $ perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 1
      
         Performance counter stats for 'system wide':
      
                    988.18 msec cpu-clock                 A #    0.987 CPUs utilized
             3,153,761,702      cycles                    A #    3.200 GHz                      (100.00%)
             8,067,769,847      instructions              A #    2.57  insn per cycle           (100.00%)
                    982.71 msec cpu-clock                 B #    0.982 CPUs utilized
             3,136,093,298      cycles                    B #    3.182 GHz                      (99.99%)
             8,109,619,327      instructions              B #    2.58  insn per cycle           (99.99%)
      
               1.001228054 seconds time elapsed
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200924124455.336326-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d1c5a0e8
  12. 18 9月, 2020 1 次提交
  13. 05 9月, 2020 1 次提交
  14. 04 9月, 2020 1 次提交
    • J
      perf stat: Turn off summary for interval mode by default · ee6a9614
      Jin Yao 提交于
      There's a risk that outputting interval mode summaries by default breaks
      CSV consumers. It already broke pmu-tools/toplev.
      
      So now we turn off the summary by default but we create a new option
      '--summary' to enable the summary. This is active even when not using
      CSV mode.
      
      Before:
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2
        #           time             counts unit events
             1.000265904           8,005.73 msec cpu-clock                 #    8.006 CPUs utilized
             1.000265904                601      context-switches          #    0.075 K/sec
             1.000265904                 10      cpu-migrations            #    0.001 K/sec
             1.000265904                  0      page-faults               #    0.000 K/sec
             1.000265904         66,746,521      cycles                    #    0.008 GHz
             1.000265904         71,874,398      instructions              #    1.08  insn per cycle
             1.000265904         13,356,781      branches                  #    1.668 M/sec
             1.000265904            298,756      branch-misses             #    2.24% of all branches
             2.001857667           8,012.52 msec cpu-clock                 #    8.013 CPUs utilized
             2.001857667                164      context-switches          #    0.020 K/sec
             2.001857667                 10      cpu-migrations            #    0.001 K/sec
             2.001857667                  2      page-faults               #    0.000 K/sec
             2.001857667          5,822,188      cycles                    #    0.001 GHz
             2.001857667          2,186,170      instructions              #    0.38  insn per cycle
             2.001857667            442,378      branches                  #    0.055 M/sec
             2.001857667             44,750      branch-misses             #   10.12% of all branches
      
         Performance counter stats for 'system wide':
      
                 16,018.25 msec cpu-clock                 #    7.993 CPUs utilized
                       765      context-switches          #    0.048 K/sec
                        20      cpu-migrations            #    0.001 K/sec
                         2      page-faults               #    0.000 K/sec
                72,568,709      cycles                    #    0.005 GHz
                74,060,568      instructions              #    1.02  insn per cycle
                13,799,159      branches                  #    0.861 M/sec
                   343,506      branch-misses             #    2.49% of all branches
      
               2.004118489 seconds time elapsed
      
      After:
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2
        #           time             counts unit events
             1.001336393           8,013.28 msec cpu-clock                 #    8.013 CPUs utilized
             1.001336393                 82      context-switches          #    0.010 K/sec
             1.001336393                  8      cpu-migrations            #    0.001 K/sec
             1.001336393                  0      page-faults               #    0.000 K/sec
             1.001336393          4,199,121      cycles                    #    0.001 GHz
             1.001336393          1,373,991      instructions              #    0.33  insn per cycle
             1.001336393            270,681      branches                  #    0.034 M/sec
             1.001336393             31,659      branch-misses             #   11.70% of all branches
             2.003905006           8,020.52 msec cpu-clock                 #    8.021 CPUs utilized
             2.003905006                184      context-switches          #    0.023 K/sec
             2.003905006                  8      cpu-migrations            #    0.001 K/sec
             2.003905006                  2      page-faults               #    0.000 K/sec
             2.003905006          5,446,190      cycles                    #    0.001 GHz
             2.003905006          2,312,547      instructions              #    0.42  insn per cycle
             2.003905006            451,691      branches                  #    0.056 M/sec
             2.003905006             37,925      branch-misses             #    8.40% of all branches
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2 --summary
        #           time             counts unit events
             1.001313128           8,013.20 msec cpu-clock                 #    8.013 CPUs utilized
             1.001313128                 83      context-switches          #    0.010 K/sec
             1.001313128                  8      cpu-migrations            #    0.001 K/sec
             1.001313128                  0      page-faults               #    0.000 K/sec
             1.001313128          4,470,950      cycles                    #    0.001 GHz
             1.001313128          1,440,045      instructions              #    0.32  insn per cycle
             1.001313128            283,222      branches                  #    0.035 M/sec
             1.001313128             33,576      branch-misses             #   11.86% of all branches
             2.003857385           8,020.34 msec cpu-clock                 #    8.020 CPUs utilized
             2.003857385                154      context-switches          #    0.019 K/sec
             2.003857385                  8      cpu-migrations            #    0.001 K/sec
             2.003857385                  2      page-faults               #    0.000 K/sec
             2.003857385          4,515,676      cycles                    #    0.001 GHz
             2.003857385          2,180,449      instructions              #    0.48  insn per cycle
             2.003857385            435,254      branches                  #    0.054 M/sec
             2.003857385             31,179      branch-misses             #    7.16% of all branches
      
         Performance counter stats for 'system wide':
      
                 16,033.53 msec cpu-clock                 #    7.992 CPUs utilized
                       237      context-switches          #    0.015 K/sec
                        16      cpu-migrations            #    0.001 K/sec
                         2      page-faults               #    0.000 K/sec
                 8,986,626      cycles                    #    0.001 GHz
                 3,620,494      instructions              #    0.40  insn per cycle
                   718,476      branches                  #    0.045 M/sec
                    64,755      branch-misses             #    9.01% of all branches
      
               2.006124542 seconds time elapsed
      
      Fixes: c7e5b328 ("perf stat: Report summary for interval mode")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200903010113.32232-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ee6a9614
  15. 04 8月, 2020 1 次提交
  16. 22 7月, 2020 1 次提交
  17. 23 6月, 2020 1 次提交
  18. 28 5月, 2020 5 次提交
    • I
      perf metricgroup: Add options to not group or merge · 05530a79
      Ian Rogers 提交于
      Add --metric-no-group that causes all events within metrics to not be
      grouped. This can allow the event to get more time when multiplexed, but
      may also lower accuracy.
      Add --metric-no-merge option. By default events in different metrics may
      be shared if the group of events for one metric is the same or larger
      than that of the second. Sharing may increase or lower accuracy and so
      is now configurable.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200520182011.32236-7-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05530a79
    • P
      perf config: Add stat.big-num support · d778a778
      Paul A. Clarke 提交于
      Add support for new "stat.big-num" boolean option.
      
      This allows a user to set a default for "--no-big-num" for "perf stat"
      commands.
      
      --
        $ perf config stat.big-num
        $ perf stat --event cycles /bin/true
      
         Performance counter stats for '/bin/true':
      
                   778,849      cycles
        [...]
        $ perf config stat.big-num=false
        $ perf config stat.big-num
        stat.big-num=false
        $ perf stat --event cycles /bin/true
      
         Performance counter stats for '/bin/true':
      
                    769622      cycles
        [...]
      --
      
      There is an interaction with "--field-separator" that must be
      accommodated, such that specifying "--big-num --field-separator={x}"
      still reports an invalid combination of options.
      
      Documentation for perf-config and perf-stat updated.
      Signed-off-by: NPaul Clarke <pc@us.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lore.kernel.org/lkml/1589991815-17951-1-git-send-email-pc@us.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d778a778
    • J
      perf stat: Report summary for interval mode · c7e5b328
      Jin Yao 提交于
      Currently 'perf stat' supports to print counts at regular interval (-I),
      but it's not very easy for user to get the overall statistics.
      
      The patch uses 'evsel->prev_raw_counts' to get counts for summary.  Copy
      the counts to 'evsel->counts' after printing the interval results.
      Next, we just follow the non-interval processing.
      
      Let's see some examples,
      
       root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
       #           time             counts unit events
            1.000412064          2,281,114      cycles
            2.001383658          2,547,880      cycles
      
        Performance counter stats for 'system wide':
      
                4,828,994      cycles
      
              2.002860349 seconds time elapsed
      
       root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
       #           time             counts unit events
            1.000389902          1,536,093      cycles
            1.000389902            420,226      instructions              #    0.27  insn per cycle
            2.001433453          2,213,952      cycles
            2.001433453            735,465      instructions              #    0.33  insn per cycle
      
        Performance counter stats for 'system wide':
      
                3,750,045      cycles
                1,155,691      instructions              #    0.31  insn per cycle
      
              2.003023361 seconds time elapsed
      
       root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
       #           time             counts unit events
            1.000435121            905,303      inst_retired.any          #      2.9 CPI
            1.000435121          2,663,333      cycles
            1.000435121            914,702      inst_retired.any          #      0.3 IPC
            1.000435121          2,676,559      cpu_clk_unhalted.thread
            2.001615941          1,951,092      inst_retired.any          #      1.8 CPI
            2.001615941          3,551,357      cycles
            2.001615941          1,950,837      inst_retired.any          #      0.5 IPC
            2.001615941          3,551,044      cpu_clk_unhalted.thread
      
        Performance counter stats for 'system wide':
      
                2,856,395      inst_retired.any          #      2.2 CPI
                6,214,690      cycles
                2,865,539      inst_retired.any          #      0.5 IPC
                6,227,603      cpu_clk_unhalted.thread
      
              2.003403078 seconds time elapsed
      
      Committer testing:
      
      Before:
      
        # perf stat -e cycles -I1000 --interval-count 2
        #           time             counts unit events
             1.000618627         26,877,408      cycles
             2.001417968        233,672,829      cycles
        #
      
      After:
      
        # perf stat -e cycles -I1000 --interval-count 2
        #           time             counts unit events
             1.001531815      5,341,388,792      cycles
             2.002936530        100,073,912      cycles
      
         Performance counter stats for 'system wide':
      
             5,441,462,704      cycles
      
               2.004893794 seconds time elapsed
      
        #
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c7e5b328
    • J
      perf stat: Save aggr value to first member of prev_raw_counts · 905365f4
      Jin Yao 提交于
      To collect the overall statistics for interval mode, we copy the counts
      from evsel->prev_raw_counts to evsel->counts.
      
      For AGGR_GLOBAL mode, because the perf_stat_process_counter creates aggr
      values from per cpu values, but the per cpu values are 0, so the
      calculated aggr values will be always 0.
      
      This patch uses a trick that saves the previous aggr value to the first
      member of perf_counts, then aggr calculation in process_counter_values
      can work correctly for AGGR_GLOBAL.
      
       v6:
       ---
       Add comments in perf_evlist__save_aggr_prev_raw_counts.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      905365f4
    • J
      perf stat: Copy counts from prev_raw_counts to evsel->counts · 297767ac
      Jin Yao 提交于
      It would be useful to support the overall statistics for perf-stat
      interval mode. For example, report the summary at the end of "perf-stat
      -I" output.
      
      But since perf-stat can support many aggregation modes, such as
      --per-thread, --per-socket, -M and etc, we need a solution which doesn't
      bring much complexity.
      
      The idea is to use 'evsel->prev_raw_counts' which is updated in each
      interval and it's saved with the latest counts. Before reporting the
      summary, we copy the counts from evsel->prev_raw_counts to
      evsel->counts, and next we just follow non-interval processing.
      
       v5:
       ---
       Don't save the previous aggr value to the member of [cpu0,thread0]
       in perf_counts. Originally that was a trick because the
       perf_stat_process_counter would create aggr values from per cpu
       values. But we don't need to do that all the time. We will
       handle it in next patch.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      297767ac
  19. 04 3月, 2020 1 次提交
    • J
      perf stat: Show percore counts in per CPU output · 1af62ce6
      Jin Yao 提交于
      We have supported the event modifier "percore" which sums up the event
      counts for all hardware threads in a core and show the counts per core.
      
      For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
      
        Performance counter stats for 'system wide':
      
       S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
       S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
       S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
       S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/
      
      This patch provides a new option "--percore-show-thread". It is used
      with event modifier "percore" together to sum up the event counts for
      all hardware threads in a core but show the counts per hardware thread.
      
      This is essentially a replacement for the any bit (which is gone in
      Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
      The original percore version was inconvenient to post process. This
      variant matches the output of the any bit.
      
      With this patch, for example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,453,061      cpu/event=cpu-cycles,percore/
       CPU1               1,823,921      cpu/event=cpu-cycles,percore/
       CPU2               1,383,166      cpu/event=cpu-cycles,percore/
       CPU3               1,102,652      cpu/event=cpu-cycles,percore/
       CPU4               2,453,061      cpu/event=cpu-cycles,percore/
       CPU5               1,823,921      cpu/event=cpu-cycles,percore/
       CPU6               1,383,166      cpu/event=cpu-cycles,percore/
       CPU7               1,102,652      cpu/event=cpu-cycles,percore/
      
      We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
      CPU2/CPU6, CPU3/CPU7).
      
      The interval mode also works. For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
       #           time CPU                    counts unit events
            1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
            1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/
      
      If we offline CPU5, the result is:
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,752,148      cpu/event=cpu-cycles,percore/
       CPU1               1,009,312      cpu/event=cpu-cycles,percore/
       CPU2               2,784,072      cpu/event=cpu-cycles,percore/
       CPU3               2,427,922      cpu/event=cpu-cycles,percore/
       CPU4               2,752,148      cpu/event=cpu-cycles,percore/
       CPU6               2,784,072      cpu/event=cpu-cycles,percore/
       CPU7               2,427,922      cpu/event=cpu-cycles,percore/
      
              1.001416041 seconds time elapsed
      
       v4:
       ---
       Ravi Bangoria reports an issue in v3. Once we offline a CPU,
       the output is not correct. The issue is we should use the cpu
       idx in print_percore_thread rather than using the cpu value.
      
       v3:
       ---
       1. Fix the interval mode output error
       2. Use cpu value (not cpu index) in config->aggr_get_id().
       3. Refine the code according to Jiri's comments.
      
       v2:
       ---
       Add the explanation in change log. This is essentially a replacement
       for the any bit. No code change.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1af62ce6
  20. 29 11月, 2019 1 次提交
  21. 07 11月, 2019 1 次提交
    • J
      perf stat: Add --per-node agregation support · 86895b48
      Jiri Olsa 提交于
      Adding new --per-node option to aggregate counts per NUMA
      nodes for system-wide mode measurements.
      
      You can specify --per-node in live mode:
      
        # perf stat  -a -I 1000 -e cycles --per-node
        #           time node   cpus             counts unit events
             1.000542550 N0       20          6,202,097      cycles
             1.000542550 N1       20            639,559      cycles
             2.002040063 N0       20          7,412,495      cycles
             2.002040063 N1       20          2,185,577      cycles
             3.003451699 N0       20          6,508,917      cycles
             3.003451699 N1       20            765,607      cycles
        ...
      
      Or in the record/report stat session:
      
        # perf stat record -a -I 1000 -e cycles
        #           time             counts unit events
             1.000536937         10,008,468      cycles
             2.002090152          9,578,539      cycles
             3.003625233          7,647,869      cycles
             4.005135036          7,032,086      cycles
        ^C     4.340902364          3,923,893      cycles
      
        # perf stat report --per-node
        #           time node   cpus             counts unit events
             1.000536937 N0       20          9,355,086      cycles
             1.000536937 N1       20            653,382      cycles
             2.002090152 N0       20          7,712,838      cycles
             2.002090152 N1       20          1,865,701      cycles
             3.003625233 N0       20          6,604,441      cycles
             3.003625233 N1       20          1,043,428      cycles
             4.005135036 N0       20          6,350,522      cycles
             4.005135036 N1       20            681,564      cycles
             4.340902364 N0       20          3,403,188      cycles
             4.340902364 N1       20            520,705      cycles
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86895b48
  22. 15 10月, 2019 1 次提交
  23. 20 9月, 2019 2 次提交
    • S
      perf stat: Reset previous counts on repeat with interval · b63fd11c
      Srikar Dronamraju 提交于
      When using 'perf stat' with repeat and interval option, it shows wrong
      values for events.
      
      The wrong values will be shown for the first interval on the second and
      subsequent repetitions.
      
      Without the fix:
      
        # perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
      
           2.000282489                 53      faults
           2.000282489                513      sched:sched_switch
           4.005478208              3,721      faults
           4.005478208              2,666      sched:sched_switch
           5.025470933                395      faults
           5.025470933              1,307      sched:sched_switch
           2.009602825 1,84,46,74,40,73,70,95,47,520      faults 		<------
           2.009602825 1,84,46,74,40,73,70,95,49,568      sched:sched_switch  <------
           4.019612206              4,730      faults
           4.019612206              2,746      sched:sched_switch
           5.039615484              3,953      faults
           5.039615484              1,496      sched:sched_switch
           2.000274620 1,84,46,74,40,73,70,95,47,520      faults		<------
           2.000274620 1,84,46,74,40,73,70,95,47,520      sched:sched_switch	<------
           4.000480342              4,282      faults
           4.000480342              2,303      sched:sched_switch
           5.000916811              1,322      faults
           5.000916811              1,064      sched:sched_switch
        #
      
      prev_raw_counts is allocated when using intervals. This is used when
      calculating the difference in the counts of events when using interval.
      
      The current counts are stored in prev_raw_counts to calculate the
      differences in the next iteration.
      
      On the first interval of the second and subsequent repetitions,
      prev_raw_counts would be the values stored in the last interval of the
      previous repetitions, while the current counts will only be for the
      first interval of the current repetition.
      
      Hence there is a possibility of events showing up as big number.
      
      Fix this by resetting prev_raw_counts whenever perf stat repeats the
      command.
      
      With the fix:
      
        # perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
      
           2.019349347              2,597      faults
           2.019349347              2,753      sched:sched_switch
           4.019577372              3,098      faults
           4.019577372              2,532      sched:sched_switch
           5.019415481              1,879      faults
           5.019415481              1,356      sched:sched_switch
           2.000178813              8,468      faults
           2.000178813              2,254      sched:sched_switch
           4.000404621              7,440      faults
           4.000404621              1,266      sched:sched_switch
           5.040196079              2,458      faults
           5.040196079                556      sched:sched_switch
           2.000191939              6,870      faults
           2.000191939              1,170      sched:sched_switch
           4.000414103                541      faults
           4.000414103                902      sched:sched_switch
           5.000809863                450      faults
           5.000809863                364      sched:sched_switch
        #
      
      Committer notes:
      
      This was broken since the cset introducing the --interval feature, i.e.
      --repeat + --interval wasn't tested at that point, add the Fixes tag so
      that automatic scripts can pick this up.
      
      Fixes: 13370a9b ("perf stat: Add interval printing")
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: stable@vger.kernel.org # v3.9+
      Link: http://lore.kernel.org/lkml/20190904094738.9558-2-srikar@linux.vnet.ibm.com
      [ Fixed up conflicts with libperf, i.e. some perf_{evsel,evlist} lost the 'perf' prefix ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b63fd11c
    • A
      perf stat: Move perf_stat_synthesize_config() to event.h · b251892d
      Arnaldo Carvalho de Melo 提交于
      Together with the other synthsizers, and rename it to
      perf_event__synthesize_stat_events().
      
      This allows us to stop including event.h in util/stat.h.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-q5ebhrp44txboobs86htu5r9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b251892d
  24. 26 8月, 2019 2 次提交
  25. 30 7月, 2019 4 次提交
  26. 11 6月, 2019 1 次提交
    • K
      perf stat: Support per-die aggregation · db5742b6
      Kan Liang 提交于
      It is useful to aggregate counts per die. E.g. Uncore becomes die-scope
      on Xeon Cascade Lake-AP.
      
      Introduce a new option "--per-die" to support per-die aggregation.
      
      The global id for each core has been changed to socket + die id + core
      id. The global id for each die is socket + die id.
      
      Add die information for per-core aggregation. The output of per-core
      aggregation will be changed from "S0-C0" to "S0-D0-C0". Any scripts
      which rely on the output format of per-core aggregation probably be
      broken.
      
      For 'perf stat record/report', there is no die information when
      processing the old perf.data. The per-die result will be the same as
      per-socket.
      
      Committer notes:
      
      Renamed 'die' variable to 'die_id' to fix the build in some systems:
      
          CC       /tmp/build/perf/builtin-script.o
        cc1: warnings being treated as errors
        builtin-stat.c: In function 'perf_env__get_die':
        builtin-stat.c:963: error: declaration of 'die' shadows a global declaration
        util/util.h:19: error: shadowed declaration is here
        mv: cannot stat `/tmp/build/perf/.builtin-stat.o.tmp': No such file or directory
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/n/tip-bsnhx7vgsuu6ei307mw60mbj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db5742b6
  27. 19 9月, 2018 1 次提交