1. 10 8月, 2022 1 次提交
    • C
      perf stat: Add JSON output option · df936cad
      Claire Jensen 提交于
      CSV output is tricky to format and column layout changes are susceptible
      to breaking parsers. New JSON-formatted output has variable names to
      identify fields that are consistent and informative, making the output
      parseable.
      
      CSV output example:
      
        1.20,msec,task-clock:u,1204272,100.00,0.697,CPUs utilized
        0,,context-switches:u,1204272,100.00,0.000,/sec
        0,,cpu-migrations:u,1204272,100.00,0.000,/sec
        70,,page-faults:u,1204272,100.00,58.126,K/sec
      
      JSON output example:
      
        {"counter-value" : "3805.723968", "unit" : "msec", "event" :
        "cpu-clock", "event-runtime" : 3805731510100.00, "pcnt-running"
        : 100.00, "metric-value" : 4.007571, "metric-unit" : "CPUs utilized"}
        {"counter-value" : "6166.000000", "unit" : "", "event" :
        "context-switches", "event-runtime" : 3805723045100.00, "pcnt-running"
        : 100.00, "metric-value" : 1.620191, "metric-unit" : "K/sec"}
        {"counter-value" : "466.000000", "unit" : "", "event" :
        "cpu-migrations", "event-runtime" : 3805727613100.00, "pcnt-running"
        : 100.00, "metric-value" : 122.447136, "metric-unit" : "/sec"}
        {"counter-value" : "208.000000", "unit" : "", "event" :
        "page-faults", "event-runtime" : 3805726799100.00, "pcnt-running"
        : 100.00, "metric-value" : 54.654516, "metric-unit" : "/sec"}
      
      Also added documentation for JSON option.
      
      There is some tidy up of CSV code including a potential memory over run
      in the os.nfields set up. To facilitate this an AGGR_MAX value is added.
      
      Committer notes:
      
      Fixed up using PRIu64 to format u64 values, not %lu.
      
      Committer testing:
      
        ⬢[acme@toolbox perf]$ perf stat -j sleep 1
        {"counter-value" : "0.731750", "unit" : "msec", "event" : "task-clock:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000731, "metric-unit" : "CPUs utilized"}
        {"counter-value" : "0.000000", "unit" : "", "event" : "context-switches:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
        {"counter-value" : "0.000000", "unit" : "", "event" : "cpu-migrations:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
        {"counter-value" : "75.000000", "unit" : "", "event" : "page-faults:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 102.494021, "metric-unit" : "K/sec"}
        {"counter-value" : "578765.000000", "unit" : "", "event" : "cycles:u", "event-runtime" : 379366, "pcnt-running" : 49.00, "metric-value" : 0.790933, "metric-unit" : "GHz"}
        {"counter-value" : "1298.000000", "unit" : "", "event" : "stalled-cycles-frontend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.224271, "metric-unit" : "frontend cycles idle"}
        {"counter-value" : "21984.000000", "unit" : "", "event" : "stalled-cycles-backend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 3.798433, "metric-unit" : "backend cycles idle"}
        {"counter-value" : "468197.000000", "unit" : "", "event" : "instructions:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.808959, "metric-unit" : "insn per cycle"}
        {"metric-value" : 0.046955, "metric-unit" : "stalled cycles per insn"}
        {"counter-value" : "103335.000000", "unit" : "", "event" : "branches:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 141.216262, "metric-unit" : "M/sec"}
        {"counter-value" : "2381.000000", "unit" : "", "event" : "branch-misses:u", "event-runtime" : 388654, "pcnt-running" : 50.00, "metric-value" : 2.304156, "metric-unit" : "of all branches"}
        ⬢[acme@toolbox perf]$
      Signed-off-by: NClaire Jensen <cjense@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alyssa Ross <hi@alyssa.is>
      Cc: Claire Jensen <clairej735@gmail.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220805200105.2020995-2-irogers@google.comSigned-off-by: NIan Rogers <irogers@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      df936cad
  2. 30 7月, 2022 1 次提交
    • Z
      perf stat: Add topdown metrics in the default perf stat on the hybrid machine · 9a0b3626
      Zhengjun Xing 提交于
      Topdown metrics are missed in the default perf stat on the hybrid machine,
      add Topdown metrics in default perf stat for hybrid systems.
      
      Currently, we support the perf metrics Topdown for the p-core PMU in the
      perf stat default, the perf metrics Topdown support for e-core PMU will be
      implemented later separately. Refactor the code adds two x86 specific
      functions. Widen the size of the event name column by 7 chars, so that all
      metrics after the "#" become aligned again.
      
      The perf metrics topdown feature is supported on the cpu_core of ADL. The
      dedicated perf metrics counter and the fixed counter 3 are used for the
      topdown events. Adding the topdown metrics doesn't trigger multiplexing.
      
      Before:
      
       # ./perf  stat  -a true
      
       Performance counter stats for 'system wide':
      
                   53.70 msec cpu-clock                 #   25.736 CPUs utilized
                      80      context-switches          #    1.490 K/sec
                      24      cpu-migrations            #  446.951 /sec
                      52      page-faults               #  968.394 /sec
               2,788,555      cpu_core/cycles/          #   51.931 M/sec
                 851,129      cpu_atom/cycles/          #   15.851 M/sec
               2,974,030      cpu_core/instructions/    #   55.385 M/sec
                 416,919      cpu_atom/instructions/    #    7.764 M/sec
                 586,136      cpu_core/branches/        #   10.916 M/sec
                  79,872      cpu_atom/branches/        #    1.487 M/sec
                  14,220      cpu_core/branch-misses/   #  264.819 K/sec
                   7,691      cpu_atom/branch-misses/   #  143.229 K/sec
      
             0.002086438 seconds time elapsed
      
      After:
      
       # ./perf stat  -a true
      
       Performance counter stats for 'system wide':
      
                   61.39 msec cpu-clock                        #   24.874 CPUs utilized
                      76      context-switches                 #    1.238 K/sec
                      24      cpu-migrations                   #  390.968 /sec
                      52      page-faults                      #  847.097 /sec
               2,753,695      cpu_core/cycles/                 #   44.859 M/sec
                 903,899      cpu_atom/cycles/                 #   14.725 M/sec
               2,927,529      cpu_core/instructions/           #   47.690 M/sec
                 428,498      cpu_atom/instructions/           #    6.980 M/sec
                 581,299      cpu_core/branches/               #    9.470 M/sec
                  83,409      cpu_atom/branches/               #    1.359 M/sec
                  13,641      cpu_core/branch-misses/          #  222.216 K/sec
                   8,008      cpu_atom/branch-misses/          #  130.453 K/sec
              14,761,308      cpu_core/slots/                  #  240.466 M/sec
               3,288,625      cpu_core/topdown-retiring/       #     22.3% retiring
               1,323,323      cpu_core/topdown-bad-spec/       #      9.0% bad speculation
               5,477,470      cpu_core/topdown-fe-bound/       #     37.1% frontend bound
               4,679,199      cpu_core/topdown-be-bound/       #     31.7% backend bound
                 646,194      cpu_core/topdown-heavy-ops/      #      4.4% heavy operations       #     17.9% light operations
               1,244,999      cpu_core/topdown-br-mispredict/  #      8.4% branch mispredict      #      0.5% machine clears
               3,891,800      cpu_core/topdown-fetch-lat/      #     26.4% fetch latency          #     10.7% fetch bandwidth
               1,879,034      cpu_core/topdown-mem-bound/      #     12.7% memory bound           #     19.0% Core bound
      
             0.002467839 seconds time elapsed
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-6-zhengjun.xing@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a0b3626
  3. 23 5月, 2022 1 次提交
    • I
      perf stat: Make use of index clearer with perf_counts · 0b9462d0
      Ian Rogers 提交于
      Try to disambiguate further when perf_counts is being accessed it is
      with a cpu map index rather than a CPU.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Dave Marchevsky <davemarchevsky@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Monnet <quentin@isovalent.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/r/20220519032005.1273691-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0b9462d0
  4. 09 5月, 2022 1 次提交
    • I
      Revert "perf stat: Support metrics with hybrid events" · 17b3867d
      Ian Rogers 提交于
      This reverts commit 60344f1a.
      
      Hybrid metrics place a PMU at the end of the parse string. This is also
      where tool events are placed. The behavior of the parse string isn't
      clear and so revert the change for now.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220507053410.3798748-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      17b3867d
  5. 03 5月, 2022 1 次提交
    • I
      perf stat: Avoid printing cpus with no counters · 570c44a0
      Ian Rogers 提交于
      perf_evlist's user_requested_cpus can contain CPUs not present in any
      evsel's cpus, for example uncore counters. Avoid printing the prefix and
      trailing \n until the first valid counter is encountered.
      Reviewed-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lore.kernel.org/lkml/20220503041757.2365696-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      570c44a0
  6. 23 4月, 2022 2 次提交
    • Z
      perf stat: Merge event counts from all hybrid PMUs · 2c8e6451
      Zhengjun Xing 提交于
      For hybrid events, by default stat aggregates and reports the event counts
      per pmu.
      
        # ./perf stat -e cycles -a  sleep 1
      
         Performance counter stats for 'system wide':
      
            14,066,877,268      cpu_core/cycles/
             6,814,443,147      cpu_atom/cycles/
      
               1.002760625 seconds time elapsed
      
      Sometimes, it's also useful to aggregate event counts from all PMUs.
      Create a new option '--hybrid-merge' to enable that behavior and report
      the counts without PMUs.
      
        # ./perf stat -e cycles -a --hybrid-merge  sleep 1
      
         Performance counter stats for 'system wide':
      
            20,732,982,512      cycles
      
               1.002776793 seconds time elapsed
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220422065635.767648-2-zhengjun.xing@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2c8e6451
    • Z
      perf stat: Support metrics with hybrid events · 60344f1a
      Zhengjun Xing 提交于
      One metric such as 'Kernel_Utilization' may be from different PMUs and
      consists of different events.
      
      For core,
      Kernel_Utilization = cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread
      
      For atom,
      Kernel_Utilization = cpu_clk_unhalted.core:k / cpu_clk_unhalted.core
      
      The metric group string for core is:
      '{cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W'
      It's internally expanded to:
      '{cpu_clk_unhalted.thread_p/metric-id=cpu_clk_unhalted.thread_p:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W#cpu_core'
      
      The metric group string for atom is:
      '{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W'
      It's internally expanded to:
      '{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W#cpu_atom'
      
      That means the group "{cpu_clk_unhalted.thread:k,cpu_clk_unhalted.thread}:W"
      is from cpu_core PMU and the group "{cpu_clk_unhalted.core:k,cpu_clk_unhalted.core}"
      is from cpu_atom PMU. And then next, check if the events in the group are
      valid on that PMU. If one event is not valid on that PMU, the associated
      group would be removed internally.
      
      In this example, cpu_clk_unhalted.thread is valid on cpu_core and
      cpu_clk_unhalted.core is valid on cpu_atom. So the checks for these two
      groups are passed.
      
      Before:
      
        # ./perf stat -M Kernel_Utilization -a sleep 1
      WARNING: events in group from different hybrid PMUs!
      WARNING: grouped events cpus do not match, disabling group:
        anon group { CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD, CPU_CLK_UNHALTED.THREAD }
      
       Performance counter stats for 'system wide':
      
              17,639,501      cpu_atom/CPU_CLK_UNHALTED.CORE/ #     1.00 Kernel_Utilization
              17,578,757      cpu_atom/CPU_CLK_UNHALTED.CORE:k/
           1,005,350,226 ns   duration_time
              43,012,352      cpu_core/CPU_CLK_UNHALTED.THREAD_P:k/ #     0.99 Kernel_Utilization
              17,608,010      cpu_atom/CPU_CLK_UNHALTED.THREAD_P:k/
              43,608,755      cpu_core/CPU_CLK_UNHALTED.THREAD/
              17,630,838      cpu_atom/CPU_CLK_UNHALTED.THREAD/
           1,005,350,226 ns   duration_time
      
             1.005350226 seconds time elapsed
      
      After:
      
        # ./perf stat -M Kernel_Utilization -a sleep 1
      
       Performance counter stats for 'system wide':
      
              17,981,895      CPU_CLK_UNHALTED.CORE [cpu_atom] #     1.00 Kernel_Utilization
              17,925,405      CPU_CLK_UNHALTED.CORE:k [cpu_atom]
           1,004,811,366 ns   duration_time
              41,246,425      CPU_CLK_UNHALTED.THREAD_P:k [cpu_core] #     0.99 Kernel_Utilization
              41,819,129      CPU_CLK_UNHALTED.THREAD [cpu_core]
           1,004,811,366 ns   duration_time
      
             1.004811366 seconds time elapsed
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220422065635.767648-1-zhengjun.xing@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      60344f1a
  7. 02 4月, 2022 1 次提交
    • I
      perf evlist: Rename cpus to user_requested_cpus · 0df6ade7
      Ian Rogers 提交于
      evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
      of all evsels.
      
      For non-task targets, cpus is set to be cpus requested from the command
      line, defaulting to all online cpus if no cpus are specified.
      
      For an uncore event, all_cpus may be just CPU 0 or every online CPU.
      
      This causes all_cpus to have fewer values than the cpus variable which
      is confusing given the 'all' in the name.
      
      To try to make the behavior clearer, rename cpus to user_requested_cpus
      and add comments on the two struct variables.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0df6ade7
  8. 06 2月, 2022 1 次提交
    • I
      perf stat: Fix display of grouped aliased events · b2b1aa73
      Ian Rogers 提交于
      An event may have a number of uncore aliases that when added to the
      evlist are consecutive.
      
      If there are multiple uncore events in a group then
      parse_events__set_leader_for_uncore_aliase will reorder the evlist so
      that events on the same PMU are adjacent.
      
      The collect_all_aliases function assumes that aliases are in blocks so
      that only the first counter is printed and all others are marked merged.
      
      The reordering for groups breaks the assumption and so all counts are
      printed.
      
      This change removes the assumption from collect_all_aliases
      that the events are in blocks and instead processes the entire evlist.
      
      Before:
      
        ```
        $ perf stat -e '{UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE,UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE},duration_time' -a -A -- sleep 1
      
         Performance counter stats for 'system wide':
      
        CPU0                  256,866      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 494,413      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      967      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,738      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  285,161      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 429,920      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      955      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,443      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  310,753      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 416,657      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,231      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,573      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  416,067      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 405,966      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,481      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,447      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  312,911      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 408,154      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,086      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,380      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  333,994      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 370,349      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,287      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,335      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  188,107      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 302,423      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      701      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,070      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  307,221      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 383,642      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,036      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,158      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  318,479      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 821,545      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,028      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   2,550      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  227,618      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 372,272      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      903      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,456      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  376,783      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 419,827      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,406      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,453      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  286,583      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 429,956      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      999      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,436      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  313,867      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 370,159      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,114      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,291      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  342,083      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 409,111      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,399      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,684      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  365,828      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 376,037      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,378      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,411      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  382,456      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 621,743      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,232      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,955      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  342,316      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 385,067      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,176      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,268      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  373,588      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 386,163      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,394      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,464      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  381,206      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 546,891      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,266      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,712      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  221,176      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 392,069      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      831      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,456      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  355,401      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 705,595      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,235      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   2,216      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  371,436      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 428,103      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,306      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,442      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  384,352      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 504,200      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,468      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,860      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  228,856      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 287,976      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      832      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,060      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  215,121      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 334,162      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      681      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,026      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  296,179      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 436,083      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,084      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,525      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  262,296      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 416,573      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      986      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,533      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  285,852      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 359,842      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,073      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,326      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  303,379      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 367,222      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,008      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,156      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  273,487      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 425,449      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                      932      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,367      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  297,596      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 414,793      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,140      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,601      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  342,365      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 360,422      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,291      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,342      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  327,196      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 580,858      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,122      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   2,014      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  296,564      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 452,817      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,087      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,694      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  375,002      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 389,393      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,478      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   1,540      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0                  365,213      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36                 594,685      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                    1,401      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                   2,222      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0            1,000,749,060 ns   duration_time
      
               1.000749060 seconds time elapsed
        ```
      
      After:
      
        ```
         Performance counter stats for 'system wide':
      
        CPU0               20,547,434      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU36              45,202,862      UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
        CPU0                   82,001      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU36                 159,688      UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
        CPU0            1,000,464,828 ns   duration_time
      
               1.000464828 seconds time elapsed
        ```
      
      Fixes: 3cdc5c2c ("perf parse-events: Handle uncore event aliases in small groups properly")
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Asaf Yaffe <asaf.yaffe@intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220205010941.1065469-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b2b1aa73
  9. 13 1月, 2022 10 次提交
    • I
      perf cpumap: Give CPUs their own type · 6d18804b
      Ian Rogers 提交于
      A common problem is confusing CPU map indices with the CPU, by wrapping
      the CPU with a struct then this is avoided. This approach is similar to
      atomic_t.
      
      Committer notes:
      
      To make it build with BUILD_BPF_SKEL=1 these files needed the
      conversions to 'struct perf_cpu' usage:
      
        tools/perf/util/bpf_counter.c
        tools/perf/util/bpf_counter_cgroup.c
        tools/perf/util/bpf_ftrace.c
      
      Also perf_env__get_cpu() was removed back in "perf cpumap: Switch
      cpu_map__build_map to cpu function".
      
      Additionally these needed to be fixed for the ARM builds to complete:
      
        tools/perf/arch/arm/util/cs-etm.c
        tools/perf/arch/arm64/util/pmu.c
      Suggested-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d18804b
    • I
      perf stat: Correct first_shadow_cpu to return index · ce37ab3e
      Ian Rogers 提交于
      perf_stat__update_shadow_stats() and perf_stat__print_shadow_stats() use
      a cpu map index rather than a CPU, but first_shadow_cpu is returning the
      wrong value for this. Change first_shadow_cpu to
      first_shadow_cpu_map_idx to make things agree.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-48-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ce37ab3e
    • I
      perf stat: Use perf_cpu_map__for_each_cpu() · 7ea82fbe
      Ian Rogers 提交于
      Correct in print_counter() where an index was being used as a cpu.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-32-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7ea82fbe
    • I
      perf stat: Rename aggr_data cpu to imply it's an index · ab90caa7
      Ian Rogers 提交于
      Trying to make cpu maps less error prone.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-31-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab90caa7
    • I
      perf stat-display: Avoid use of core for CPU · 7365f105
      Ian Rogers 提交于
      Correct use of cpumap index in print_no_aggr_metric().
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-26-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7365f105
    • I
      perf cpumap: Rename empty functions · 51b826fa
      Ian Rogers 提交于
      Remove cpu_map from name as a cpu_map isn't used. Pass a const pointer
      rather than by value to avoid unnecessary copying.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-15-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      51b826fa
    • I
      perf cpumap: Simplify equal function name · 3ac23d19
      Ian Rogers 提交于
      Rename cpu_map__compare_aggr_cpu_id() to aggr_cpu_id__equal(), the
      cpu_map part of the name is misleading. Equal better describes the
      function than compare.
      
      Switch to const pointer rather than value as struct given the number of
      variables in aggr_cpu_id().
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-14-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ac23d19
    • I
      perf stat: Switch to cpu version of cpu_map__get() · 88031a0d
      Ian Rogers 提交于
      Avoid possible bugs where the wrong index is passed with the cpu_map.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      88031a0d
    • I
      perf stat: Switch aggregation to use for_each loop · a023283f
      Ian Rogers 提交于
      Tidy up the use of cpu and index to hopefully make the code less error
      prone. Avoid unused warnings with (void) which will be removed in a
      later patch.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-5-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a023283f
    • I
      perf stat: Correct aggregation CPU map · 01843ca0
      Ian Rogers 提交于
      Switch the perf_cpu_map in aggr_update_shadow from
      the evlist to the counter's cpu map, so the index is appropriate. This
      addresses a problem where uncore counts, with a cpumap like:
      $ cat /sys/devices/uncore_imc_0/cpumask
      0,18
      Don't aggregate counts in CPUs based on the index of those values in the
      cpumap (0 and 1) but on the actual CPU (0 and 18). Thereby correcting
      metric calculations in per-socket mode for counters without a full
      cpumask.
      
      On a SkylakeX with a tweaked DRAM_BW_Use metric, to remove unnecessary
      scaling, this gives:
      
      Before:
      $ /perf stat --per-socket -M DRAM_BW_Use -I 1000
           1.001102293 S0        1              27.01 MiB  uncore_imc/cas_count_write/ #   103.00 DRAM_BW_Use
           1.001102293 S0        1              30.22 MiB  uncore_imc/cas_count_read/
           1.001102293 S0        1      1,001,102,293 ns   duration_time
           1.001102293 S1        1              20.10 MiB  uncore_imc/cas_count_write/ #     0.00 DRAM_BW_Use
           1.001102293 S1        1              32.74 MiB  uncore_imc/cas_count_read/
           1.001102293 S1        0      <not counted> ns   duration_time
           2.003517973 S0        1              83.04 MiB  uncore_imc/cas_count_write/ #   920.00 DRAM_BW_Use
           2.003517973 S0        1             145.95 MiB  uncore_imc/cas_count_read/
           2.003517973 S0        1      1,002,415,680 ns   duration_time
           2.003517973 S1        1             302.45 MiB  uncore_imc/cas_count_write/ #     0.00 DRAM_BW_Use
           2.003517973 S1        1             290.99 MiB  uncore_imc/cas_count_read/
           2.003517973 S1        0      <not counted> ns   duration_time
      
      After:
      $ perf stat --per-socket -M DRAM_BW_Use -I 1000
           1.001080840 S0        1              24.96 MiB  uncore_imc/cas_count_write/ #    54.00 DRAM_BW_Use
           1.001080840 S0        1              33.64 MiB  uncore_imc/cas_count_read/
           1.001080840 S0        1      1,001,080,840 ns   duration_time
           1.001080840 S1        1              42.43 MiB  uncore_imc/cas_count_write/ #    84.00 DRAM_BW_Use
           1.001080840 S1        1              47.05 MiB  uncore_imc/cas_count_read/
           1.001080840 S1        0      <not counted> ns   duration_time
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      01843ca0
  10. 14 7月, 2021 1 次提交
    • J
      perf stat: Merge uncore events by default for hybrid platform · e0a7ef2a
      Jin Yao 提交于
      On a hybrid platform, by default 'perf stat' aggregates and reports the
      event counts per PMU. For example,
      
        # perf stat -e cycles -a true
      
         Performance counter stats for 'system wide':
      
                 1,400,445      cpu_core/cycles/
                   680,881      cpu_atom/cycles/
      
               0.001770773 seconds time elapsed
      
      But for uncore events that's not a suitable method. Uncore has nothing
      to do with hybrid. So for uncore events, we aggregate event counts from
      all PMUs and report the counts without PMUs.
      
      Before:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     2,058      uncore_arb_0/event=0x81,umask=0x1/
                     2,028      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000614498 seconds time elapsed
      
      After:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     3,996      arb/event=0x81,umask=0x1/
                         0      arb/event=0x84,umask=0x1/
      
               0.000630046 seconds time elapsed
      
      Of course, we also keep the '--no-merge' working for uncore events.
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
      
         Performance counter stats for 'system wide':
      
                     1,952      uncore_arb_0/event=0x81,umask=0x1/
                     1,921      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000575536 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0a7ef2a
  11. 07 7月, 2021 1 次提交
    • J
      perf stat: Disable the NMI watchdog message on hybrid · 493be70a
      Jin Yao 提交于
      If we run a single workload that only runs on big core, there is always
      a ugly message about disabling the NMI watchdog because the atom is not
      counted.
      
      Before:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.43 msec task-clock                #    0.396 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        45      page-faults               #  103.918 K/sec
                   639,634      cpu_core/cycles/          #    1.477 G/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   643,498      cpu_core/instructions/    #    1.486 G/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   123,715      cpu_core/branches/        #  285.694 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,094      cpu_core/branch-misses/   #    9.454 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001092407 seconds time elapsed
      
               0.001144000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.001904106 seconds time elapsed
      
               0.001947000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
        The events in group usually have to be from the same PMU. Try reorganizing the group.
      
      Now we disable the NMI watchdog message on hybrid, otherwise there
      are too many false positives.
      
      After:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.79 msec task-clock                #    0.419 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        48      page-faults               #   60.889 K/sec
                   777,692      cpu_core/cycles/          #  986.519 M/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   669,147      cpu_core/instructions/    #  848.828 M/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   128,635      cpu_core/branches/        #  163.176 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,089      cpu_core/branch-misses/   #    5.187 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001880649 seconds time elapsed
      
               0.001935000 seconds user
               0.000000000 seconds sys
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.000963319 seconds time elapsed
      
               0.000999000 seconds user
               0.000000000 seconds sys
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210610034557.29766-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      493be70a
  12. 04 6月, 2021 1 次提交
  13. 12 5月, 2021 1 次提交
  14. 29 4月, 2021 2 次提交
    • J
      perf stat: Filter out unmatched aggregation for hybrid event · 92637cc7
      Jin Yao 提交于
      perf-stat has supported some aggregation modes, such as --per-core,
      --per-socket and etc. While for hybrid event, it may only available
      on part of cpus. So for --per-core, we need to filter out the
      unavailable cores, for --per-socket, filter out the unavailable
      sockets, and so on.
      
      Before:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            479,530      cpu_core/cycles/
        S0-D0-C4           2            175,007      cpu_core/cycles/
        S0-D0-C8           2            166,240      cpu_core/cycles/
        S0-D0-C12          2            704,673      cpu_core/cycles/
        S0-D0-C16          2            865,835      cpu_core/cycles/
        S0-D0-C20          2          2,958,461      cpu_core/cycles/
        S0-D0-C24          2            163,988      cpu_core/cycles/
        S0-D0-C28          2            164,729      cpu_core/cycles/
        S0-D0-C32          0      <not counted>      cpu_core/cycles/
        S0-D0-C33          0      <not counted>      cpu_core/cycles/
        S0-D0-C34          0      <not counted>      cpu_core/cycles/
        S0-D0-C35          0      <not counted>      cpu_core/cycles/
        S0-D0-C36          0      <not counted>      cpu_core/cycles/
        S0-D0-C37          0      <not counted>      cpu_core/cycles/
        S0-D0-C38          0      <not counted>      cpu_core/cycles/
        S0-D0-C39          0      <not counted>      cpu_core/cycles/
      
               1.003597211 seconds time elapsed
      
      After:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            210,428      cpu_core/cycles/
        S0-D0-C4           2            444,830      cpu_core/cycles/
        S0-D0-C8           2            435,241      cpu_core/cycles/
        S0-D0-C12          2            423,976      cpu_core/cycles/
        S0-D0-C16          2            859,350      cpu_core/cycles/
        S0-D0-C20          2          1,559,589      cpu_core/cycles/
        S0-D0-C24          2            163,924      cpu_core/cycles/
        S0-D0-C28          2            376,610      cpu_core/cycles/
      
               1.003621290 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Co-developed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-16-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92637cc7
    • J
      perf stat: Uniquify hybrid event name · 12279429
      Jin Yao 提交于
      It would be useful to let user know the pmu which the event belongs to.
      perf-stat has supported '--no-merge' option and it can print the pmu
      name after the event name, such as:
      
      "cycles [cpu_core]"
      
      Now this option is enabled by default for hybrid platform but change
      the format to:
      
      "cpu_core/cycles/"
      
      If user configs the name, we still use the user specified name.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      ink: https://lore.kernel.org/r/20210427070139.25256-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      12279429
  15. 20 4月, 2021 1 次提交
  16. 24 3月, 2021 1 次提交
    • J
      perf stat: Align CSV output for summary mode · 0bdad978
      Jin Yao 提交于
      The 'perf stat' subcommand supports the request for a summary of the
      interval counter readings.  But the summary lines break the CSV output
      so it's hard for scripts to parse the result.
      
      Before:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
             1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
             1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
             1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
             1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
             1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
             1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
             1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
        8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
        270,,context-switches,8013513297,100.00,0.034,K/sec
        13,,cpu-migrations,8013530032,100.00,0.002,K/sec
        184,,page-faults,8013546992,100.00,0.023,K/sec
        20574191,,cycles,8013551506,100.00,0.003,GHz
        10562267,,instructions,8013564958,100.00,0.51,insn per cycle
        2019244,,branches,8013575673,100.00,0.252,M/sec
        106152,,branch-misses,8013585776,100.00,5.26,of all branches
      
      The summary line loses the timestamp column, which breaks the CSV
      output.
      
      We add a column at the original 'timestamp' position and it just says
      'summary' for the summary line.
      
      After:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
             1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
             1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
             1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
             1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
             1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
             1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
             1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
                 summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
                 summary,218,,context-switches,8012753271,100.00,0.027,K/sec
                 summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
                 summary,0,,page-faults,8012786257,100.00,0.000,K/sec
                 summary,15004518,,cycles,8012790637,100.00,0.002,GHz
                 summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
                 summary,1590259,,branches,8012814766,100.00,0.198,M/sec
                 summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
      
      Now it's easy for script to analyse the summary lines.
      
      Of course, we also consider not to break possible existing scripts which
      can continue to use the broken CSV format by using a new '--no-csv-summary.'
      option.
      
        # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
             1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
             1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
             1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
             1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
             1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
             1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
             1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
             1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
        8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
        197,,context-switches,8012703742,100.00,24.586,/sec
        9,,cpu-migrations,8012720902,100.00,1.123,/sec
        644,,page-faults,8012738266,100.00,80.373,/sec
        18350698,,cycles,8012744109,100.00,0.002,GHz
        12745021,,instructions,8012759001,100.00,0.69,insn per cycle
        2458033,,branches,8012770864,100.00,306.768,K/sec
        102107,,branch-misses,8012781751,100.00,4.15,of all branches
      
      This option can be enabled in perf config by setting the variable
      'stat.no-csv-summary'.
      
        # perf config stat.no-csv-summary=true
      
        # perf config -l
        stat.no-csv-summary=true
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
             1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
             1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
             1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
             1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
             1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
             1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
             1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
        8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
        205,,context-switches,8013308394,100.00,25.583,/sec
        10,,cpu-migrations,8013324681,100.00,1.248,/sec
        0,,page-faults,8013340926,100.00,0.000,/sec
        8027742,,cycles,8013344503,100.00,0.001,GHz
        2871717,,instructions,8013356501,100.00,0.36,insn per cycle
        553564,,branches,8013366204,100.00,69.081,K/sec
        54021,,branch-misses,8013375952,100.00,9.76,of all branches
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bdad978
  17. 07 3月, 2021 1 次提交
  18. 21 1月, 2021 1 次提交
    • S
      perf stat: Enable counting events for BPF programs · fa853c4b
      Song Liu 提交于
      Introduce 'perf stat -b' option, which counts events for BPF programs, like:
      
        [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
           3.490341825             60,720      ref-cycles
           3.490341825             37,797      cycles
           4.491540887             37,120      ref-cycles
           4.491540887             31,963      cycles
      
      The example above counts 'cycles' and 'ref-cycles' of BPF program of id
      254.  This is similar to bpftool-prog-profile command, but more
      flexible.
      
      'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
      programs (monitor-progs) to the target BPF program (target-prog). The
      monitor-progs read perf_event before and after the target-prog, and
      aggregate the difference in a BPF map. Then the user space reads data
      from these maps.
      
      A new 'struct bpf_counter' is introduced to provide a common interface
      that uses BPF programs/maps to count perf events.
      
      Committer notes:
      
      Removed all but bpf_counter.h includes from evsel.h, not needed at all.
      
      Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
      buffer passed to the kernel libbpf_num_possible_cpus() entries, not
      evsel__nr_cpus(evsel), as the former uses
      /sys/devices/system/cpu/possible while the later uses
      /sys/devices/system/cpu/online, which may be less than the 'possible'
      number making the bpf map lookup overwrite memory and cause hard to
      debug memory corruption.
      
      We need to continue using evsel__nr_cpus(evsel) when accessing the
      perf_counts array tho, not to overwrite another are of memory :-)
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fa853c4b
  19. 24 12月, 2020 7 次提交
  20. 01 12月, 2020 1 次提交
  21. 28 11月, 2020 1 次提交
    • N
      perf stat: Use proper cpu for shadow stats · c0ee1d5a
      Namhyung Kim 提交于
      Currently perf stat shows some metrics (like IPC) for defined events.
      But when no aggregation mode is used (-A option), it shows incorrect
      values since it used a value from a different cpu.
      
      Before:
      
        $ perf stat -aA -e cycles,instructions sleep 1
      
         Performance counter stats for 'system wide':
      
        CPU0      116,057,380      cycles
        CPU1       86,084,722      cycles
        CPU2       99,423,125      cycles
        CPU3       98,272,994      cycles
        CPU0       53,369,217      instructions      #    0.46  insn per cycle
        CPU1       33,378,058      instructions      #    0.29  insn per cycle
        CPU2       58,150,086      instructions      #    0.50  insn per cycle
        CPU3       40,029,703      instructions      #    0.34  insn per cycle
      
             1.001816971 seconds time elapsed
      
      So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
      but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
      
      After:
      
        $ perf stat -aA -e cycles,instructions sleep 1
      
         Performance counter stats for 'system wide':
      
        CPU0      109,621,384      cycles
        CPU1      159,026,454      cycles
        CPU2       99,460,366      cycles
        CPU3      124,144,142      cycles
        CPU0       44,396,706      instructions      #    0.41  insn per cycle
        CPU1      120,195,425      instructions      #    0.76  insn per cycle
        CPU2       44,763,978      instructions      #    0.45  insn per cycle
        CPU3       69,049,079      instructions      #    0.56  insn per cycle
      
             1.001910444 seconds time elapsed
      
      Fixes: 44d49a60 ("perf stat: Support metrics in --per-core/socket mode")
      Reported-by: NSam Xi <xyzsam@google.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20201127041404.390276-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c0ee1d5a
  22. 10 9月, 2020 1 次提交
  23. 01 9月, 2020 1 次提交
    • T
      perf stat: Fix out of bounds array access in the print_counters() evlist method · 313146a8
      Thomas Richter 提交于
      Fix a compile error on F32 and gcc version 10.1 on s390 in file
      utils/stat-display.c.  The error does not show up with make DEBUG=y.  In
      fact the issue shows up when using both compiler options -O6 and
      -D_FORTIFY_SOURCE=2 (which are omitted with DEBUG=Y).
      
      This is the offending call chain:
      
      print_counter_aggr()
        printout(config, -1, 0, ...)  with 2nd parm id set to -1
          aggr_printout(config, x, id --> -1, ...) which leads to this code:
      		case AGGR_NONE:
                      if (evsel->percore && !config->percore_show_thread) {
                              ....
                      } else {
                              fprintf(config->output, "CPU%*d%s",
                                      config->csv_output ? 0 : -7,
                                      evsel__cpus(evsel)->map[id],
      				                        ^^ id is -1 !!!!
                                      config->csv_sep);
                      }
      
      This is a compiler inlining issue which is detected on s390 but not on
      other plattforms.
      
      Output before:
      
       # make util/stat-display.o
          .....
      
        util/stat-display.c: In function ‘perf_evlist__print_counters’:
        util/stat-display.c:121:4: error: array subscript -1 is below array
            bounds of ‘int[]’ [-Werror=array-bounds]
        121 |    fprintf(config->output, "CPU%*d%s",
            |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        122 |     config->csv_output ? 0 : -7,
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        123 |     evsel__cpus(evsel)->map[id],
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        124 |     config->csv_sep);
            |     ~~~~~~~~~~~~~~~~
        In file included from util/evsel.h:13,
                       from util/evlist.h:13,
                       from util/stat-display.c:9:
        /root/linux/tools/lib/perf/include/internal/cpumap.h:10:7:
        note: while referencing ‘map’
         10 |  int  map[];
            |       ^~~
        cc1: all warnings being treated as errors
        mv: cannot stat 'util/.stat-display.o.tmp': No such file or directory
        make[3]: *** [/root/linux/tools/build/Makefile.build:97: util/stat-display.o]
        Error 1
        make[2]: *** [Makefile.perf:716: util/stat-display.o] Error 2
        make[1]: *** [Makefile.perf:231: sub-make] Error 2
        make: *** [Makefile:110: util/stat-display.o] Error 2
        [root@t35lp46 perf]#
      
      Output after:
      
        # make util/stat-display.o
          .....
        CC       util/stat-display.o
        [root@t35lp46 perf]#
      
      Committer notes:
      
      Removed the removal of {} enclosing the multiline else block, as pointed
      out by Jiri Olsa.
      Suggested-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200825063304.77733-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      313146a8