1. 07 3月, 2021 1 次提交
    • J
      perf stat: Fix wrong skipping for per-die aggregation · 034f7ee1
      Jin Yao 提交于
      Uncore becomes die-scope on Xeon Cascade Lake-AP and perf has supported
      --per-die aggregation yet.
      
      One issue is found in check_per_pkg() for uncore events running on AP
      system. On cascade Lake-AP, we have:
      
      S0-D0
      S0-D1
      S1-D0
      S1-D1
      
      But in check_per_pkg(), S0-D1 and S1-D1 are skipped because the mask
      bits for S0 and S1 have been set for S0-D0 and S1-D0. It doesn't check
      die_id. So the counting for S0-D1 and S1-D1 are set to zero.  That's not
      correct.
      
        root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
           1.001460963 S0-D0           1            1317376 Bytes llc_misses.mem_read
           1.001460963 S0-D1           1             998016 Bytes llc_misses.mem_read
           1.001460963 S1-D0           1             970496 Bytes llc_misses.mem_read
           1.001460963 S1-D1           1            1291264 Bytes llc_misses.mem_read
           2.003488021 S0-D0           1            1082048 Bytes llc_misses.mem_read
           2.003488021 S0-D1           1            1919040 Bytes llc_misses.mem_read
           2.003488021 S1-D0           1             890752 Bytes llc_misses.mem_read
           2.003488021 S1-D1           1            2380800 Bytes llc_misses.mem_read
           3.005613270 S0-D0           1            1126080 Bytes llc_misses.mem_read
           3.005613270 S0-D1           1            2898176 Bytes llc_misses.mem_read
           3.005613270 S1-D0           1             870912 Bytes llc_misses.mem_read
           3.005613270 S1-D1           1            3388608 Bytes llc_misses.mem_read
           4.007627598 S0-D0           1            1124608 Bytes llc_misses.mem_read
           4.007627598 S0-D1           1            3884416 Bytes llc_misses.mem_read
           4.007627598 S1-D0           1             921088 Bytes llc_misses.mem_read
           4.007627598 S1-D1           1            4451840 Bytes llc_misses.mem_read
           5.001479927 S0-D0           1             963328 Bytes llc_misses.mem_read
           5.001479927 S0-D1           1            4831936 Bytes llc_misses.mem_read
           5.001479927 S1-D0           1             895104 Bytes llc_misses.mem_read
           5.001479927 S1-D1           1            5496640 Bytes llc_misses.mem_read
      
      From above output, we can see S0-D1 and S1-D1 don't report the interval
      values, they are continued to grow. That's because check_per_pkg()
      wrongly decides to use zero counts for S0-D1 and S1-D1.
      
      So in check_per_pkg(), we should use hashmap(socket,die) to decide if
      the cpu counts needs to skip. Only considering socket is not enough.
      
      Now with this patch,
      
        root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
           1.001586691 S0-D0           1            1229440 Bytes llc_misses.mem_read
           1.001586691 S0-D1           1             976832 Bytes llc_misses.mem_read
           1.001586691 S1-D0           1             938304 Bytes llc_misses.mem_read
           1.001586691 S1-D1           1            1227328 Bytes llc_misses.mem_read
           2.003776312 S0-D0           1            1586752 Bytes llc_misses.mem_read
           2.003776312 S0-D1           1             875392 Bytes llc_misses.mem_read
           2.003776312 S1-D0           1             855616 Bytes llc_misses.mem_read
           2.003776312 S1-D1           1             949376 Bytes llc_misses.mem_read
           3.006512788 S0-D0           1            1338880 Bytes llc_misses.mem_read
           3.006512788 S0-D1           1             920064 Bytes llc_misses.mem_read
           3.006512788 S1-D0           1             877184 Bytes llc_misses.mem_read
           3.006512788 S1-D1           1            1020736 Bytes llc_misses.mem_read
           4.008895291 S0-D0           1             926592 Bytes llc_misses.mem_read
           4.008895291 S0-D1           1             906368 Bytes llc_misses.mem_read
           4.008895291 S1-D0           1             892224 Bytes llc_misses.mem_read
           4.008895291 S1-D1           1             987712 Bytes llc_misses.mem_read
           5.001590993 S0-D0           1             962624 Bytes llc_misses.mem_read
           5.001590993 S0-D1           1             912512 Bytes llc_misses.mem_read
           5.001590993 S1-D0           1             891200 Bytes llc_misses.mem_read
           5.001590993 S1-D1           1             978432 Bytes llc_misses.mem_read
      
      On no-die system, die_id is 0, actually it's hashmap(socket,0), original behavior
      is not changed.
      Reported-by: NYing Huang <ying.huang@intel.com>
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ying Huang <ying.huang@intel.com>
      Link: http://lore.kernel.org/lkml/20210128013417.25597-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      034f7ee1
  2. 09 2月, 2021 1 次提交
    • K
      perf stat: Support L2 Topdown events · 63e39aa6
      Kan Liang 提交于
      The TMA method level 2 metrics is supported from the Intel Sapphire
      Rapids server, which expose four L2 Topdown metrics events to user
      space. There are eight L2 events in total. The other four L2 Topdown
      metrics events are calculated from the corresponding L1 and the exposed
      L2 events.
      
      Now, the --topdown prints the complete top-down metrics that supported
      by the CPU. For the Intel Sapphire Rapids server, there are 4 L1 events
      and 8 L2 events displyed in one line.
      
      Add a new option, --td-level, to display the top-down statistics that
      equal to or lower than the input level.
      
      The L2 event is marked only when both its L1 parent event and itself
      crosse the threshold.
      
      Here is an example:
      
        $ perf stat --topdown --td-level=2 --no-metric-only sleep 1
        Topdown accuracy may decrease when measuring long periods.
        Please print the result regularly, e.g. -I1000
      
        Performance counter stats for 'sleep 1':
      
           16,734,390   slots
            2,100,001   topdown-retiring       # 12.6% retiring
            2,034,376   topdown-bad-spec       # 12.3% bad speculation
            4,003,128   topdown-fe-bound       # 24.1% frontend bound
              328,125   topdown-heavy-ops      #  2.0% heavy operations    #  10.6% light operations
            1,968,751   topdown-br-mispredict  # 11.9% branch mispredict   #  0.4% machine clears
            2,953,127   topdown-fetch-lat      # 17.8% fetch latency       #  6.3% fetch bandwidth
            5,906,255   topdown-mem-bound      # 35.6% memory bound        #  15.4% core bound
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-9-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63e39aa6
  3. 21 1月, 2021 1 次提交
    • S
      perf stat: Enable counting events for BPF programs · fa853c4b
      Song Liu 提交于
      Introduce 'perf stat -b' option, which counts events for BPF programs, like:
      
        [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
           3.490341825             60,720      ref-cycles
           3.490341825             37,797      cycles
           4.491540887             37,120      ref-cycles
           4.491540887             31,963      cycles
      
      The example above counts 'cycles' and 'ref-cycles' of BPF program of id
      254.  This is similar to bpftool-prog-profile command, but more
      flexible.
      
      'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
      programs (monitor-progs) to the target BPF program (target-prog). The
      monitor-progs read perf_event before and after the target-prog, and
      aggregate the difference in a BPF map. Then the user space reads data
      from these maps.
      
      A new 'struct bpf_counter' is introduced to provide a common interface
      that uses BPF programs/maps to count perf events.
      
      Committer notes:
      
      Removed all but bpf_counter.h includes from evsel.h, not needed at all.
      
      Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
      buffer passed to the kernel libbpf_num_possible_cpus() entries, not
      evsel__nr_cpus(evsel), as the former uses
      /sys/devices/system/cpu/possible while the later uses
      /sys/devices/system/cpu/online, which may be less than the 'possible'
      number making the bpf map lookup overwrite memory and cause hard to
      debug memory corruption.
      
      We need to continue using evsel__nr_cpus(evsel) when accessing the
      perf_counts array tho, not to overwrite another are of memory :-)
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fa853c4b
  4. 24 12月, 2020 2 次提交
  5. 01 12月, 2020 1 次提交
  6. 30 11月, 2020 2 次提交
  7. 18 9月, 2020 1 次提交
  8. 28 5月, 2020 5 次提交
    • J
      perf stat: Report summary for interval mode · c7e5b328
      Jin Yao 提交于
      Currently 'perf stat' supports to print counts at regular interval (-I),
      but it's not very easy for user to get the overall statistics.
      
      The patch uses 'evsel->prev_raw_counts' to get counts for summary.  Copy
      the counts to 'evsel->counts' after printing the interval results.
      Next, we just follow the non-interval processing.
      
      Let's see some examples,
      
       root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
       #           time             counts unit events
            1.000412064          2,281,114      cycles
            2.001383658          2,547,880      cycles
      
        Performance counter stats for 'system wide':
      
                4,828,994      cycles
      
              2.002860349 seconds time elapsed
      
       root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
       #           time             counts unit events
            1.000389902          1,536,093      cycles
            1.000389902            420,226      instructions              #    0.27  insn per cycle
            2.001433453          2,213,952      cycles
            2.001433453            735,465      instructions              #    0.33  insn per cycle
      
        Performance counter stats for 'system wide':
      
                3,750,045      cycles
                1,155,691      instructions              #    0.31  insn per cycle
      
              2.003023361 seconds time elapsed
      
       root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
       #           time             counts unit events
            1.000435121            905,303      inst_retired.any          #      2.9 CPI
            1.000435121          2,663,333      cycles
            1.000435121            914,702      inst_retired.any          #      0.3 IPC
            1.000435121          2,676,559      cpu_clk_unhalted.thread
            2.001615941          1,951,092      inst_retired.any          #      1.8 CPI
            2.001615941          3,551,357      cycles
            2.001615941          1,950,837      inst_retired.any          #      0.5 IPC
            2.001615941          3,551,044      cpu_clk_unhalted.thread
      
        Performance counter stats for 'system wide':
      
                2,856,395      inst_retired.any          #      2.2 CPI
                6,214,690      cycles
                2,865,539      inst_retired.any          #      0.5 IPC
                6,227,603      cpu_clk_unhalted.thread
      
              2.003403078 seconds time elapsed
      
      Committer testing:
      
      Before:
      
        # perf stat -e cycles -I1000 --interval-count 2
        #           time             counts unit events
             1.000618627         26,877,408      cycles
             2.001417968        233,672,829      cycles
        #
      
      After:
      
        # perf stat -e cycles -I1000 --interval-count 2
        #           time             counts unit events
             1.001531815      5,341,388,792      cycles
             2.002936530        100,073,912      cycles
      
         Performance counter stats for 'system wide':
      
             5,441,462,704      cycles
      
               2.004893794 seconds time elapsed
      
        #
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c7e5b328
    • J
      perf stat: Save aggr value to first member of prev_raw_counts · 905365f4
      Jin Yao 提交于
      To collect the overall statistics for interval mode, we copy the counts
      from evsel->prev_raw_counts to evsel->counts.
      
      For AGGR_GLOBAL mode, because the perf_stat_process_counter creates aggr
      values from per cpu values, but the per cpu values are 0, so the
      calculated aggr values will be always 0.
      
      This patch uses a trick that saves the previous aggr value to the first
      member of perf_counts, then aggr calculation in process_counter_values
      can work correctly for AGGR_GLOBAL.
      
       v6:
       ---
       Add comments in perf_evlist__save_aggr_prev_raw_counts.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      905365f4
    • J
      perf stat: Copy counts from prev_raw_counts to evsel->counts · 297767ac
      Jin Yao 提交于
      It would be useful to support the overall statistics for perf-stat
      interval mode. For example, report the summary at the end of "perf-stat
      -I" output.
      
      But since perf-stat can support many aggregation modes, such as
      --per-thread, --per-socket, -M and etc, we need a solution which doesn't
      bring much complexity.
      
      The idea is to use 'evsel->prev_raw_counts' which is updated in each
      interval and it's saved with the latest counts. Before reporting the
      summary, we copy the counts from evsel->prev_raw_counts to
      evsel->counts, and next we just follow non-interval processing.
      
       v5:
       ---
       Don't save the previous aggr value to the member of [cpu0,thread0]
       in perf_counts. Originally that was a trick because the
       perf_stat_process_counter would create aggr values from per cpu
       values. But we don't need to do that all the time. We will
       handle it in next patch.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      297767ac
    • J
      perf counts: Reset prev_raw_counts counts · cf4d9bd6
      Jin Yao 提交于
      When we want to reset the evsel->prev_raw_counts, zeroing the aggr is
      not enough, we need to reset the perf_counts too.
      
      The perf_counts__reset zeros the perf_counts, and it should zero the
      aggr too. This patch changes perf_counts__reset to non-static, and calls
      it in evsel__reset_prev_raw_counts to reset the prev_raw_counts.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cf4d9bd6
    • A
      perf counts: Rename perf_evsel__*counts() to evsel__*counts() · 7d1e239e
      Arnaldo Carvalho de Melo 提交于
      As these are 'struct evsel' methods, not part of tools/lib/perf/, aka
      libperf, to whom the perf_ prefix belongs.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7d1e239e
  9. 06 5月, 2020 5 次提交
  10. 23 4月, 2020 1 次提交
  11. 29 11月, 2019 1 次提交
  12. 07 11月, 2019 1 次提交
    • J
      perf stat: Add --per-node agregation support · 86895b48
      Jiri Olsa 提交于
      Adding new --per-node option to aggregate counts per NUMA
      nodes for system-wide mode measurements.
      
      You can specify --per-node in live mode:
      
        # perf stat  -a -I 1000 -e cycles --per-node
        #           time node   cpus             counts unit events
             1.000542550 N0       20          6,202,097      cycles
             1.000542550 N1       20            639,559      cycles
             2.002040063 N0       20          7,412,495      cycles
             2.002040063 N1       20          2,185,577      cycles
             3.003451699 N0       20          6,508,917      cycles
             3.003451699 N1       20            765,607      cycles
        ...
      
      Or in the record/report stat session:
      
        # perf stat record -a -I 1000 -e cycles
        #           time             counts unit events
             1.000536937         10,008,468      cycles
             2.002090152          9,578,539      cycles
             3.003625233          7,647,869      cycles
             4.005135036          7,032,086      cycles
        ^C     4.340902364          3,923,893      cycles
      
        # perf stat report --per-node
        #           time node   cpus             counts unit events
             1.000536937 N0       20          9,355,086      cycles
             1.000536937 N1       20            653,382      cycles
             2.002090152 N0       20          7,712,838      cycles
             2.002090152 N1       20          1,865,701      cycles
             3.003625233 N0       20          6,604,441      cycles
             3.003625233 N1       20          1,043,428      cycles
             4.005135036 N0       20          6,350,522      cycles
             4.005135036 N1       20            681,564      cycles
             4.340902364 N0       20          3,403,188      cycles
             4.340902364 N1       20            520,705      cycles
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86895b48
  13. 15 10月, 2019 1 次提交
  14. 25 9月, 2019 1 次提交
  15. 20 9月, 2019 5 次提交
  16. 01 9月, 2019 1 次提交
  17. 30 8月, 2019 2 次提交
  18. 29 8月, 2019 3 次提交
  19. 26 8月, 2019 1 次提交
  20. 23 8月, 2019 2 次提交
  21. 22 8月, 2019 1 次提交
  22. 30 7月, 2019 1 次提交