1. 13 1月, 2022 9 次提交
    • I
      perf stat: Correct first_shadow_cpu to return index · ce37ab3e
      Ian Rogers 提交于
      perf_stat__update_shadow_stats() and perf_stat__print_shadow_stats() use
      a cpu map index rather than a CPU, but first_shadow_cpu is returning the
      wrong value for this. Change first_shadow_cpu to
      first_shadow_cpu_map_idx to make things agree.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-48-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ce37ab3e
    • I
      perf stat: Use perf_cpu_map__for_each_cpu() · 7ea82fbe
      Ian Rogers 提交于
      Correct in print_counter() where an index was being used as a cpu.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-32-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7ea82fbe
    • I
      perf stat: Rename aggr_data cpu to imply it's an index · ab90caa7
      Ian Rogers 提交于
      Trying to make cpu maps less error prone.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-31-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab90caa7
    • I
      perf stat-display: Avoid use of core for CPU · 7365f105
      Ian Rogers 提交于
      Correct use of cpumap index in print_no_aggr_metric().
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-26-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7365f105
    • I
      perf cpumap: Rename empty functions · 51b826fa
      Ian Rogers 提交于
      Remove cpu_map from name as a cpu_map isn't used. Pass a const pointer
      rather than by value to avoid unnecessary copying.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-15-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      51b826fa
    • I
      perf cpumap: Simplify equal function name · 3ac23d19
      Ian Rogers 提交于
      Rename cpu_map__compare_aggr_cpu_id() to aggr_cpu_id__equal(), the
      cpu_map part of the name is misleading. Equal better describes the
      function than compare.
      
      Switch to const pointer rather than value as struct given the number of
      variables in aggr_cpu_id().
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-14-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ac23d19
    • I
      perf stat: Switch to cpu version of cpu_map__get() · 88031a0d
      Ian Rogers 提交于
      Avoid possible bugs where the wrong index is passed with the cpu_map.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      88031a0d
    • I
      perf stat: Switch aggregation to use for_each loop · a023283f
      Ian Rogers 提交于
      Tidy up the use of cpu and index to hopefully make the code less error
      prone. Avoid unused warnings with (void) which will be removed in a
      later patch.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-5-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a023283f
    • I
      perf stat: Correct aggregation CPU map · 01843ca0
      Ian Rogers 提交于
      Switch the perf_cpu_map in aggr_update_shadow from
      the evlist to the counter's cpu map, so the index is appropriate. This
      addresses a problem where uncore counts, with a cpumap like:
      $ cat /sys/devices/uncore_imc_0/cpumask
      0,18
      Don't aggregate counts in CPUs based on the index of those values in the
      cpumap (0 and 1) but on the actual CPU (0 and 18). Thereby correcting
      metric calculations in per-socket mode for counters without a full
      cpumask.
      
      On a SkylakeX with a tweaked DRAM_BW_Use metric, to remove unnecessary
      scaling, this gives:
      
      Before:
      $ /perf stat --per-socket -M DRAM_BW_Use -I 1000
           1.001102293 S0        1              27.01 MiB  uncore_imc/cas_count_write/ #   103.00 DRAM_BW_Use
           1.001102293 S0        1              30.22 MiB  uncore_imc/cas_count_read/
           1.001102293 S0        1      1,001,102,293 ns   duration_time
           1.001102293 S1        1              20.10 MiB  uncore_imc/cas_count_write/ #     0.00 DRAM_BW_Use
           1.001102293 S1        1              32.74 MiB  uncore_imc/cas_count_read/
           1.001102293 S1        0      <not counted> ns   duration_time
           2.003517973 S0        1              83.04 MiB  uncore_imc/cas_count_write/ #   920.00 DRAM_BW_Use
           2.003517973 S0        1             145.95 MiB  uncore_imc/cas_count_read/
           2.003517973 S0        1      1,002,415,680 ns   duration_time
           2.003517973 S1        1             302.45 MiB  uncore_imc/cas_count_write/ #     0.00 DRAM_BW_Use
           2.003517973 S1        1             290.99 MiB  uncore_imc/cas_count_read/
           2.003517973 S1        0      <not counted> ns   duration_time
      
      After:
      $ perf stat --per-socket -M DRAM_BW_Use -I 1000
           1.001080840 S0        1              24.96 MiB  uncore_imc/cas_count_write/ #    54.00 DRAM_BW_Use
           1.001080840 S0        1              33.64 MiB  uncore_imc/cas_count_read/
           1.001080840 S0        1      1,001,080,840 ns   duration_time
           1.001080840 S1        1              42.43 MiB  uncore_imc/cas_count_write/ #    84.00 DRAM_BW_Use
           1.001080840 S1        1              47.05 MiB  uncore_imc/cas_count_read/
           1.001080840 S1        0      <not counted> ns   duration_time
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      01843ca0
  2. 14 7月, 2021 1 次提交
    • J
      perf stat: Merge uncore events by default for hybrid platform · e0a7ef2a
      Jin Yao 提交于
      On a hybrid platform, by default 'perf stat' aggregates and reports the
      event counts per PMU. For example,
      
        # perf stat -e cycles -a true
      
         Performance counter stats for 'system wide':
      
                 1,400,445      cpu_core/cycles/
                   680,881      cpu_atom/cycles/
      
               0.001770773 seconds time elapsed
      
      But for uncore events that's not a suitable method. Uncore has nothing
      to do with hybrid. So for uncore events, we aggregate event counts from
      all PMUs and report the counts without PMUs.
      
      Before:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     2,058      uncore_arb_0/event=0x81,umask=0x1/
                     2,028      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000614498 seconds time elapsed
      
      After:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     3,996      arb/event=0x81,umask=0x1/
                         0      arb/event=0x84,umask=0x1/
      
               0.000630046 seconds time elapsed
      
      Of course, we also keep the '--no-merge' working for uncore events.
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
      
         Performance counter stats for 'system wide':
      
                     1,952      uncore_arb_0/event=0x81,umask=0x1/
                     1,921      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000575536 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0a7ef2a
  3. 07 7月, 2021 1 次提交
    • J
      perf stat: Disable the NMI watchdog message on hybrid · 493be70a
      Jin Yao 提交于
      If we run a single workload that only runs on big core, there is always
      a ugly message about disabling the NMI watchdog because the atom is not
      counted.
      
      Before:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.43 msec task-clock                #    0.396 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        45      page-faults               #  103.918 K/sec
                   639,634      cpu_core/cycles/          #    1.477 G/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   643,498      cpu_core/instructions/    #    1.486 G/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   123,715      cpu_core/branches/        #  285.694 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,094      cpu_core/branch-misses/   #    9.454 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001092407 seconds time elapsed
      
               0.001144000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.001904106 seconds time elapsed
      
               0.001947000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
        The events in group usually have to be from the same PMU. Try reorganizing the group.
      
      Now we disable the NMI watchdog message on hybrid, otherwise there
      are too many false positives.
      
      After:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.79 msec task-clock                #    0.419 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        48      page-faults               #   60.889 K/sec
                   777,692      cpu_core/cycles/          #  986.519 M/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   669,147      cpu_core/instructions/    #  848.828 M/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   128,635      cpu_core/branches/        #  163.176 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,089      cpu_core/branch-misses/   #    5.187 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001880649 seconds time elapsed
      
               0.001935000 seconds user
               0.000000000 seconds sys
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.000963319 seconds time elapsed
      
               0.000999000 seconds user
               0.000000000 seconds sys
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210610034557.29766-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      493be70a
  4. 04 6月, 2021 1 次提交
  5. 12 5月, 2021 1 次提交
  6. 29 4月, 2021 2 次提交
    • J
      perf stat: Filter out unmatched aggregation for hybrid event · 92637cc7
      Jin Yao 提交于
      perf-stat has supported some aggregation modes, such as --per-core,
      --per-socket and etc. While for hybrid event, it may only available
      on part of cpus. So for --per-core, we need to filter out the
      unavailable cores, for --per-socket, filter out the unavailable
      sockets, and so on.
      
      Before:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            479,530      cpu_core/cycles/
        S0-D0-C4           2            175,007      cpu_core/cycles/
        S0-D0-C8           2            166,240      cpu_core/cycles/
        S0-D0-C12          2            704,673      cpu_core/cycles/
        S0-D0-C16          2            865,835      cpu_core/cycles/
        S0-D0-C20          2          2,958,461      cpu_core/cycles/
        S0-D0-C24          2            163,988      cpu_core/cycles/
        S0-D0-C28          2            164,729      cpu_core/cycles/
        S0-D0-C32          0      <not counted>      cpu_core/cycles/
        S0-D0-C33          0      <not counted>      cpu_core/cycles/
        S0-D0-C34          0      <not counted>      cpu_core/cycles/
        S0-D0-C35          0      <not counted>      cpu_core/cycles/
        S0-D0-C36          0      <not counted>      cpu_core/cycles/
        S0-D0-C37          0      <not counted>      cpu_core/cycles/
        S0-D0-C38          0      <not counted>      cpu_core/cycles/
        S0-D0-C39          0      <not counted>      cpu_core/cycles/
      
               1.003597211 seconds time elapsed
      
      After:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            210,428      cpu_core/cycles/
        S0-D0-C4           2            444,830      cpu_core/cycles/
        S0-D0-C8           2            435,241      cpu_core/cycles/
        S0-D0-C12          2            423,976      cpu_core/cycles/
        S0-D0-C16          2            859,350      cpu_core/cycles/
        S0-D0-C20          2          1,559,589      cpu_core/cycles/
        S0-D0-C24          2            163,924      cpu_core/cycles/
        S0-D0-C28          2            376,610      cpu_core/cycles/
      
               1.003621290 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Co-developed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-16-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92637cc7
    • J
      perf stat: Uniquify hybrid event name · 12279429
      Jin Yao 提交于
      It would be useful to let user know the pmu which the event belongs to.
      perf-stat has supported '--no-merge' option and it can print the pmu
      name after the event name, such as:
      
      "cycles [cpu_core]"
      
      Now this option is enabled by default for hybrid platform but change
      the format to:
      
      "cpu_core/cycles/"
      
      If user configs the name, we still use the user specified name.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      ink: https://lore.kernel.org/r/20210427070139.25256-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      12279429
  7. 20 4月, 2021 1 次提交
  8. 24 3月, 2021 1 次提交
    • J
      perf stat: Align CSV output for summary mode · 0bdad978
      Jin Yao 提交于
      The 'perf stat' subcommand supports the request for a summary of the
      interval counter readings.  But the summary lines break the CSV output
      so it's hard for scripts to parse the result.
      
      Before:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
             1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
             1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
             1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
             1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
             1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
             1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
             1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
        8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
        270,,context-switches,8013513297,100.00,0.034,K/sec
        13,,cpu-migrations,8013530032,100.00,0.002,K/sec
        184,,page-faults,8013546992,100.00,0.023,K/sec
        20574191,,cycles,8013551506,100.00,0.003,GHz
        10562267,,instructions,8013564958,100.00,0.51,insn per cycle
        2019244,,branches,8013575673,100.00,0.252,M/sec
        106152,,branch-misses,8013585776,100.00,5.26,of all branches
      
      The summary line loses the timestamp column, which breaks the CSV
      output.
      
      We add a column at the original 'timestamp' position and it just says
      'summary' for the summary line.
      
      After:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
             1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
             1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
             1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
             1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
             1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
             1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
             1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
                 summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
                 summary,218,,context-switches,8012753271,100.00,0.027,K/sec
                 summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
                 summary,0,,page-faults,8012786257,100.00,0.000,K/sec
                 summary,15004518,,cycles,8012790637,100.00,0.002,GHz
                 summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
                 summary,1590259,,branches,8012814766,100.00,0.198,M/sec
                 summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
      
      Now it's easy for script to analyse the summary lines.
      
      Of course, we also consider not to break possible existing scripts which
      can continue to use the broken CSV format by using a new '--no-csv-summary.'
      option.
      
        # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
             1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
             1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
             1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
             1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
             1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
             1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
             1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
             1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
        8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
        197,,context-switches,8012703742,100.00,24.586,/sec
        9,,cpu-migrations,8012720902,100.00,1.123,/sec
        644,,page-faults,8012738266,100.00,80.373,/sec
        18350698,,cycles,8012744109,100.00,0.002,GHz
        12745021,,instructions,8012759001,100.00,0.69,insn per cycle
        2458033,,branches,8012770864,100.00,306.768,K/sec
        102107,,branch-misses,8012781751,100.00,4.15,of all branches
      
      This option can be enabled in perf config by setting the variable
      'stat.no-csv-summary'.
      
        # perf config stat.no-csv-summary=true
      
        # perf config -l
        stat.no-csv-summary=true
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
             1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
             1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
             1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
             1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
             1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
             1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
             1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
        8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
        205,,context-switches,8013308394,100.00,25.583,/sec
        10,,cpu-migrations,8013324681,100.00,1.248,/sec
        0,,page-faults,8013340926,100.00,0.000,/sec
        8027742,,cycles,8013344503,100.00,0.001,GHz
        2871717,,instructions,8013356501,100.00,0.36,insn per cycle
        553564,,branches,8013366204,100.00,69.081,K/sec
        54021,,branch-misses,8013375952,100.00,9.76,of all branches
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0bdad978
  9. 07 3月, 2021 1 次提交
  10. 21 1月, 2021 1 次提交
    • S
      perf stat: Enable counting events for BPF programs · fa853c4b
      Song Liu 提交于
      Introduce 'perf stat -b' option, which counts events for BPF programs, like:
      
        [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
           3.490341825             60,720      ref-cycles
           3.490341825             37,797      cycles
           4.491540887             37,120      ref-cycles
           4.491540887             31,963      cycles
      
      The example above counts 'cycles' and 'ref-cycles' of BPF program of id
      254.  This is similar to bpftool-prog-profile command, but more
      flexible.
      
      'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
      programs (monitor-progs) to the target BPF program (target-prog). The
      monitor-progs read perf_event before and after the target-prog, and
      aggregate the difference in a BPF map. Then the user space reads data
      from these maps.
      
      A new 'struct bpf_counter' is introduced to provide a common interface
      that uses BPF programs/maps to count perf events.
      
      Committer notes:
      
      Removed all but bpf_counter.h includes from evsel.h, not needed at all.
      
      Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
      buffer passed to the kernel libbpf_num_possible_cpus() entries, not
      evsel__nr_cpus(evsel), as the former uses
      /sys/devices/system/cpu/possible while the later uses
      /sys/devices/system/cpu/online, which may be less than the 'possible'
      number making the bpf map lookup overwrite memory and cause hard to
      debug memory corruption.
      
      We need to continue using evsel__nr_cpus(evsel) when accessing the
      perf_counts array tho, not to overwrite another are of memory :-)
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fa853c4b
  11. 24 12月, 2020 7 次提交
  12. 01 12月, 2020 1 次提交
  13. 28 11月, 2020 1 次提交
    • N
      perf stat: Use proper cpu for shadow stats · c0ee1d5a
      Namhyung Kim 提交于
      Currently perf stat shows some metrics (like IPC) for defined events.
      But when no aggregation mode is used (-A option), it shows incorrect
      values since it used a value from a different cpu.
      
      Before:
      
        $ perf stat -aA -e cycles,instructions sleep 1
      
         Performance counter stats for 'system wide':
      
        CPU0      116,057,380      cycles
        CPU1       86,084,722      cycles
        CPU2       99,423,125      cycles
        CPU3       98,272,994      cycles
        CPU0       53,369,217      instructions      #    0.46  insn per cycle
        CPU1       33,378,058      instructions      #    0.29  insn per cycle
        CPU2       58,150,086      instructions      #    0.50  insn per cycle
        CPU3       40,029,703      instructions      #    0.34  insn per cycle
      
             1.001816971 seconds time elapsed
      
      So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
      but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
      
      After:
      
        $ perf stat -aA -e cycles,instructions sleep 1
      
         Performance counter stats for 'system wide':
      
        CPU0      109,621,384      cycles
        CPU1      159,026,454      cycles
        CPU2       99,460,366      cycles
        CPU3      124,144,142      cycles
        CPU0       44,396,706      instructions      #    0.41  insn per cycle
        CPU1      120,195,425      instructions      #    0.76  insn per cycle
        CPU2       44,763,978      instructions      #    0.45  insn per cycle
        CPU3       69,049,079      instructions      #    0.56  insn per cycle
      
             1.001910444 seconds time elapsed
      
      Fixes: 44d49a60 ("perf stat: Support metrics in --per-core/socket mode")
      Reported-by: NSam Xi <xyzsam@google.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20201127041404.390276-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c0ee1d5a
  14. 10 9月, 2020 1 次提交
  15. 01 9月, 2020 1 次提交
    • T
      perf stat: Fix out of bounds array access in the print_counters() evlist method · 313146a8
      Thomas Richter 提交于
      Fix a compile error on F32 and gcc version 10.1 on s390 in file
      utils/stat-display.c.  The error does not show up with make DEBUG=y.  In
      fact the issue shows up when using both compiler options -O6 and
      -D_FORTIFY_SOURCE=2 (which are omitted with DEBUG=Y).
      
      This is the offending call chain:
      
      print_counter_aggr()
        printout(config, -1, 0, ...)  with 2nd parm id set to -1
          aggr_printout(config, x, id --> -1, ...) which leads to this code:
      		case AGGR_NONE:
                      if (evsel->percore && !config->percore_show_thread) {
                              ....
                      } else {
                              fprintf(config->output, "CPU%*d%s",
                                      config->csv_output ? 0 : -7,
                                      evsel__cpus(evsel)->map[id],
      				                        ^^ id is -1 !!!!
                                      config->csv_sep);
                      }
      
      This is a compiler inlining issue which is detected on s390 but not on
      other plattforms.
      
      Output before:
      
       # make util/stat-display.o
          .....
      
        util/stat-display.c: In function ‘perf_evlist__print_counters’:
        util/stat-display.c:121:4: error: array subscript -1 is below array
            bounds of ‘int[]’ [-Werror=array-bounds]
        121 |    fprintf(config->output, "CPU%*d%s",
            |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        122 |     config->csv_output ? 0 : -7,
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        123 |     evsel__cpus(evsel)->map[id],
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        124 |     config->csv_sep);
            |     ~~~~~~~~~~~~~~~~
        In file included from util/evsel.h:13,
                       from util/evlist.h:13,
                       from util/stat-display.c:9:
        /root/linux/tools/lib/perf/include/internal/cpumap.h:10:7:
        note: while referencing ‘map’
         10 |  int  map[];
            |       ^~~
        cc1: all warnings being treated as errors
        mv: cannot stat 'util/.stat-display.o.tmp': No such file or directory
        make[3]: *** [/root/linux/tools/build/Makefile.build:97: util/stat-display.o]
        Error 1
        make[2]: *** [Makefile.perf:716: util/stat-display.o] Error 2
        make[1]: *** [Makefile.perf:231: sub-make] Error 2
        make: *** [Makefile:110: util/stat-display.o] Error 2
        [root@t35lp46 perf]#
      
      Output after:
      
        # make util/stat-display.o
          .....
        CC       util/stat-display.o
        [root@t35lp46 perf]#
      
      Committer notes:
      
      Removed the removal of {} enclosing the multiline else block, as pointed
      out by Jiri Olsa.
      Suggested-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200825063304.77733-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      313146a8
  16. 09 6月, 2020 1 次提交
  17. 06 5月, 2020 3 次提交
  18. 30 4月, 2020 1 次提交
    • K
      perf tools: Enable Hz/hz prinitg for --metric-only option · 3351c6da
      Kajol Jain 提交于
      Commit 54b50916 ("perf stat: Implement --metric-only mode") added
      function 'valid_only_metric()' which drops "Hz" or "hz", if it is part
      of "ScaleUnit". This patch enable it since hv_24x7 supports couple of
      frequency events.
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-7-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3351c6da
  19. 24 3月, 2020 1 次提交
    • J
      perf stat: Align the output for interval aggregation mode · d13e9e41
      Jin Yao 提交于
      There is a slight misalignment in -A -I output.
      
      For example:
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000440863 CPU0               1,068,388      cpu/event=cpu-cycles/
            1.000440863 CPU1                 875,954      cpu/event=cpu-cycles/
            1.000440863 CPU2               3,072,538      cpu/event=cpu-cycles/
            1.000440863 CPU3               4,026,870      cpu/event=cpu-cycles/
            1.000440863 CPU4               5,919,630      cpu/event=cpu-cycles/
            1.000440863 CPU5               2,714,260      cpu/event=cpu-cycles/
            1.000440863 CPU6               2,219,240      cpu/event=cpu-cycles/
            1.000440863 CPU7               1,299,232      cpu/event=cpu-cycles/
      
      The value of counts is not aligned with the column "counts" and
      the event name is not aligned with the column "events".
      
      With this patch, the output is,
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000423009 CPU0                  997,421      cpu/event=cpu-cycles/
            1.000423009 CPU1                1,422,042      cpu/event=cpu-cycles/
            1.000423009 CPU2                  484,651      cpu/event=cpu-cycles/
            1.000423009 CPU3                  525,791      cpu/event=cpu-cycles/
            1.000423009 CPU4                1,370,100      cpu/event=cpu-cycles/
            1.000423009 CPU5                  442,072      cpu/event=cpu-cycles/
            1.000423009 CPU6                  205,643      cpu/event=cpu-cycles/
            1.000423009 CPU7                1,302,250      cpu/event=cpu-cycles/
      
      Now output is aligned.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d13e9e41
  20. 11 3月, 2020 1 次提交
  21. 04 3月, 2020 1 次提交
    • J
      perf stat: Show percore counts in per CPU output · 1af62ce6
      Jin Yao 提交于
      We have supported the event modifier "percore" which sums up the event
      counts for all hardware threads in a core and show the counts per core.
      
      For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
      
        Performance counter stats for 'system wide':
      
       S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
       S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
       S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
       S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/
      
      This patch provides a new option "--percore-show-thread". It is used
      with event modifier "percore" together to sum up the event counts for
      all hardware threads in a core but show the counts per hardware thread.
      
      This is essentially a replacement for the any bit (which is gone in
      Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
      The original percore version was inconvenient to post process. This
      variant matches the output of the any bit.
      
      With this patch, for example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,453,061      cpu/event=cpu-cycles,percore/
       CPU1               1,823,921      cpu/event=cpu-cycles,percore/
       CPU2               1,383,166      cpu/event=cpu-cycles,percore/
       CPU3               1,102,652      cpu/event=cpu-cycles,percore/
       CPU4               2,453,061      cpu/event=cpu-cycles,percore/
       CPU5               1,823,921      cpu/event=cpu-cycles,percore/
       CPU6               1,383,166      cpu/event=cpu-cycles,percore/
       CPU7               1,102,652      cpu/event=cpu-cycles,percore/
      
      We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
      CPU2/CPU6, CPU3/CPU7).
      
      The interval mode also works. For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
       #           time CPU                    counts unit events
            1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
            1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/
      
      If we offline CPU5, the result is:
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,752,148      cpu/event=cpu-cycles,percore/
       CPU1               1,009,312      cpu/event=cpu-cycles,percore/
       CPU2               2,784,072      cpu/event=cpu-cycles,percore/
       CPU3               2,427,922      cpu/event=cpu-cycles,percore/
       CPU4               2,752,148      cpu/event=cpu-cycles,percore/
       CPU6               2,784,072      cpu/event=cpu-cycles,percore/
       CPU7               2,427,922      cpu/event=cpu-cycles,percore/
      
              1.001416041 seconds time elapsed
      
       v4:
       ---
       Ravi Bangoria reports an issue in v3. Once we offline a CPU,
       the output is not correct. The issue is we should use the cpu
       idx in print_percore_thread rather than using the cpu value.
      
       v3:
       ---
       1. Fix the interval mode output error
       2. Use cpu value (not cpu index) in config->aggr_get_id().
       3. Refine the code according to Jiri's comments.
      
       v2:
       ---
       Add the explanation in change log. This is essentially a replacement
       for the any bit. No code change.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1af62ce6
  22. 07 11月, 2019 1 次提交
    • J
      perf stat: Add --per-node agregation support · 86895b48
      Jiri Olsa 提交于
      Adding new --per-node option to aggregate counts per NUMA
      nodes for system-wide mode measurements.
      
      You can specify --per-node in live mode:
      
        # perf stat  -a -I 1000 -e cycles --per-node
        #           time node   cpus             counts unit events
             1.000542550 N0       20          6,202,097      cycles
             1.000542550 N1       20            639,559      cycles
             2.002040063 N0       20          7,412,495      cycles
             2.002040063 N1       20          2,185,577      cycles
             3.003451699 N0       20          6,508,917      cycles
             3.003451699 N1       20            765,607      cycles
        ...
      
      Or in the record/report stat session:
      
        # perf stat record -a -I 1000 -e cycles
        #           time             counts unit events
             1.000536937         10,008,468      cycles
             2.002090152          9,578,539      cycles
             3.003625233          7,647,869      cycles
             4.005135036          7,032,086      cycles
        ^C     4.340902364          3,923,893      cycles
      
        # perf stat report --per-node
        #           time node   cpus             counts unit events
             1.000536937 N0       20          9,355,086      cycles
             1.000536937 N1       20            653,382      cycles
             2.002090152 N0       20          7,712,838      cycles
             2.002090152 N1       20          1,865,701      cycles
             3.003625233 N0       20          6,604,441      cycles
             3.003625233 N1       20          1,043,428      cycles
             4.005135036 N0       20          6,350,522      cycles
             4.005135036 N1       20            681,564      cycles
             4.340902364 N0       20          3,403,188      cycles
             4.340902364 N1       20            520,705      cycles
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86895b48
  23. 01 9月, 2019 1 次提交