1. 04 11月, 2020 2 次提交
  2. 28 9月, 2020 3 次提交
    • N
      perf tools: Allow creation of cgroup without open · 89fb1ca2
      Namhyung Kim 提交于
      This is a preparation for a test case of expanding events for multiple
      cgroups.  Instead of using real system cgroup, the test will use fake
      cgroups so it needs a way to have them without a open file descriptor.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200924124455.336326-5-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      89fb1ca2
    • N
      perf tools: Copy metric events properly when expand cgroups · b214ba8c
      Namhyung Kim 提交于
      The metricgroup__copy_metric_events() is to handle metrics events when
      expanding event for cgroups.  As the metric events keep pointers to
      evsel, it should be refreshed when events are cloned during the
      operation.
      
      The perf_stat__collect_metric_expr() is also called in case an event has
      a metric directly.
      
      During the copy, it references evsel by index as the evlist now has
      cloned evsels for the given cgroup.
      
      Also kernel test robot found an issue in the python module import so add
      empty implementations of those two functions to fix it.
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200924124455.336326-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b214ba8c
    • N
      perf stat: Add --for-each-cgroup option · d1c5a0e8
      Namhyung Kim 提交于
      The --for-each-cgroup option is a syntax sugar to monitor large number
      of cgroups easily.  Current command line requires to list all the events
      and cgroups even if users want to monitor same events for each cgroup.
      This patch addresses that usage by copying given events for each cgroup
      on user's behalf.
      
      For instance, if they want to monitor 6 events for 200 cgroups each they
      should write 1200 event names (with -e) AND 1200 cgroup names (with -G)
      on the command line.  But with this change, they can just specify 6
      events and 200 cgroups with a new option.
      
      A simpler example below: It wants to measure 3 events for 2 cgroups ('A'
      and 'B').  The result is that total 6 events are counted like below.
      
        $ perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 1
      
         Performance counter stats for 'system wide':
      
                    988.18 msec cpu-clock                 A #    0.987 CPUs utilized
             3,153,761,702      cycles                    A #    3.200 GHz                      (100.00%)
             8,067,769,847      instructions              A #    2.57  insn per cycle           (100.00%)
                    982.71 msec cpu-clock                 B #    0.982 CPUs utilized
             3,136,093,298      cycles                    B #    3.182 GHz                      (99.99%)
             8,109,619,327      instructions              B #    2.58  insn per cycle           (99.99%)
      
               1.001228054 seconds time elapsed
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200924124455.336326-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d1c5a0e8
  3. 23 9月, 2020 1 次提交
    • J
      perf stat: Skip duration_time in setup_system_wide · 002a3d69
      Jin Yao 提交于
      Some metrics (such as DRAM_BW_Use) consists of uncore events and
      duration_time. For uncore events, counter->core.system_wide is true. But
      for duration_time, counter->core.system_wide is false so
      target.system_wide is set to false.
      
      Then 'enable_on_exec' is set in perf_event_attr of uncore event.  Kernel
      will return error when trying to open the uncore event.
      
      This patch skips the duration_time in setup_system_wide then
      target.system_wide will be set to true for the evlist of uncore events +
      duration_time.
      
      Before (tested on skylake desktop):
      
        # perf stat -M DRAM_BW_Use -- sleep 1
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (arb/event=0x84,umask=0x1/).
        /bin/dmesg | grep -i perf may provide additional information.
      
      After:
      
        # perf stat -M DRAM_BW_Use -- sleep 1
      
         Performance counter stats for 'system wide':
      
                      169      arb/event=0x84,umask=0x1/ #     0.00 DRAM_BW_Use
                   40,427      arb/event=0x81,umask=0x1/
            1,000,902,197 ns   duration_time
      
              1.000902197 seconds time elapsed
      
      Fixes: e3ba76de ("perf tools: Force uncore events to system wide monitoring")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200922015004.30114-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      002a3d69
  4. 18 9月, 2020 2 次提交
  5. 05 9月, 2020 4 次提交
  6. 04 9月, 2020 1 次提交
    • J
      perf stat: Turn off summary for interval mode by default · ee6a9614
      Jin Yao 提交于
      There's a risk that outputting interval mode summaries by default breaks
      CSV consumers. It already broke pmu-tools/toplev.
      
      So now we turn off the summary by default but we create a new option
      '--summary' to enable the summary. This is active even when not using
      CSV mode.
      
      Before:
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2
        #           time             counts unit events
             1.000265904           8,005.73 msec cpu-clock                 #    8.006 CPUs utilized
             1.000265904                601      context-switches          #    0.075 K/sec
             1.000265904                 10      cpu-migrations            #    0.001 K/sec
             1.000265904                  0      page-faults               #    0.000 K/sec
             1.000265904         66,746,521      cycles                    #    0.008 GHz
             1.000265904         71,874,398      instructions              #    1.08  insn per cycle
             1.000265904         13,356,781      branches                  #    1.668 M/sec
             1.000265904            298,756      branch-misses             #    2.24% of all branches
             2.001857667           8,012.52 msec cpu-clock                 #    8.013 CPUs utilized
             2.001857667                164      context-switches          #    0.020 K/sec
             2.001857667                 10      cpu-migrations            #    0.001 K/sec
             2.001857667                  2      page-faults               #    0.000 K/sec
             2.001857667          5,822,188      cycles                    #    0.001 GHz
             2.001857667          2,186,170      instructions              #    0.38  insn per cycle
             2.001857667            442,378      branches                  #    0.055 M/sec
             2.001857667             44,750      branch-misses             #   10.12% of all branches
      
         Performance counter stats for 'system wide':
      
                 16,018.25 msec cpu-clock                 #    7.993 CPUs utilized
                       765      context-switches          #    0.048 K/sec
                        20      cpu-migrations            #    0.001 K/sec
                         2      page-faults               #    0.000 K/sec
                72,568,709      cycles                    #    0.005 GHz
                74,060,568      instructions              #    1.02  insn per cycle
                13,799,159      branches                  #    0.861 M/sec
                   343,506      branch-misses             #    2.49% of all branches
      
               2.004118489 seconds time elapsed
      
      After:
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2
        #           time             counts unit events
             1.001336393           8,013.28 msec cpu-clock                 #    8.013 CPUs utilized
             1.001336393                 82      context-switches          #    0.010 K/sec
             1.001336393                  8      cpu-migrations            #    0.001 K/sec
             1.001336393                  0      page-faults               #    0.000 K/sec
             1.001336393          4,199,121      cycles                    #    0.001 GHz
             1.001336393          1,373,991      instructions              #    0.33  insn per cycle
             1.001336393            270,681      branches                  #    0.034 M/sec
             1.001336393             31,659      branch-misses             #   11.70% of all branches
             2.003905006           8,020.52 msec cpu-clock                 #    8.021 CPUs utilized
             2.003905006                184      context-switches          #    0.023 K/sec
             2.003905006                  8      cpu-migrations            #    0.001 K/sec
             2.003905006                  2      page-faults               #    0.000 K/sec
             2.003905006          5,446,190      cycles                    #    0.001 GHz
             2.003905006          2,312,547      instructions              #    0.42  insn per cycle
             2.003905006            451,691      branches                  #    0.056 M/sec
             2.003905006             37,925      branch-misses             #    8.40% of all branches
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2 --summary
        #           time             counts unit events
             1.001313128           8,013.20 msec cpu-clock                 #    8.013 CPUs utilized
             1.001313128                 83      context-switches          #    0.010 K/sec
             1.001313128                  8      cpu-migrations            #    0.001 K/sec
             1.001313128                  0      page-faults               #    0.000 K/sec
             1.001313128          4,470,950      cycles                    #    0.001 GHz
             1.001313128          1,440,045      instructions              #    0.32  insn per cycle
             1.001313128            283,222      branches                  #    0.035 M/sec
             1.001313128             33,576      branch-misses             #   11.86% of all branches
             2.003857385           8,020.34 msec cpu-clock                 #    8.020 CPUs utilized
             2.003857385                154      context-switches          #    0.019 K/sec
             2.003857385                  8      cpu-migrations            #    0.001 K/sec
             2.003857385                  2      page-faults               #    0.000 K/sec
             2.003857385          4,515,676      cycles                    #    0.001 GHz
             2.003857385          2,180,449      instructions              #    0.48  insn per cycle
             2.003857385            435,254      branches                  #    0.054 M/sec
             2.003857385             31,179      branch-misses             #    7.16% of all branches
      
         Performance counter stats for 'system wide':
      
                 16,033.53 msec cpu-clock                 #    7.992 CPUs utilized
                       237      context-switches          #    0.015 K/sec
                        16      cpu-migrations            #    0.001 K/sec
                         2      page-faults               #    0.000 K/sec
                 8,986,626      cycles                    #    0.001 GHz
                 3,620,494      instructions              #    0.40  insn per cycle
                   718,476      branches                  #    0.045 M/sec
                    64,755      branch-misses             #    9.01% of all branches
      
               2.006124542 seconds time elapsed
      
      Fixes: c7e5b328 ("perf stat: Report summary for interval mode")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200903010113.32232-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ee6a9614
  7. 04 8月, 2020 1 次提交
  8. 22 7月, 2020 6 次提交
  9. 23 6月, 2020 2 次提交
  10. 02 6月, 2020 1 次提交
    • J
      perf stat: Ensure group is defined on top of the same cpu mask · a9a17902
      Jiri Olsa 提交于
      Jin Yao reported the issue (and posted first versions of this change)
      with groups being defined over events with different cpu mask.
      
      This causes assert aborts in get_group_fd, like:
      
        # perf stat -M "C2_Pkg_Residency" -a -- sleep 1
        perf: util/evsel.c:1464: get_group_fd: Assertion `!(fd == -1)' failed.
        Aborted
      
      All the events in the group have to be defined over the same cpus so the
      group_fd can be found for every leader/member pair.
      
      Adding check to ensure this condition is met and removing the group
      (with warning) if we detect mixed cpus, like:
      
        $ sudo perf stat -e '{power/energy-cores/,cycles},{instructions,power/energy-cores/}'
        WARNING: event cpu maps do not match, disabling group:
          anon group { power/energy-cores/, cycles }
          anon group { instructions, power/energy-cores/ }
      
      Ian asked also for cpu maps details, it's displayed in verbose mode:
      
        $ sudo perf stat -e '{cycles,power/energy-cores/}' -v
        WARNING: group events cpu maps do not match, disabling group:
          anon group { power/energy-cores/, cycles }
             power/energy-cores/: 0
             cycles: 0-7
          anon group { instructions, power/energy-cores/ }
             instructions: 0-7
             power/energy-cores/: 0
      
      Committer testing:
      
        [root@seventh ~]# perf stat -e '{power/energy-cores/,cycles},{instructions,power/energy-cores/}'
        WARNING: grouped events cpus do not match, disabling group:
          anon group { power/energy-cores/, cycles }
          anon group { instructions, power/energy-cores/ }
        ^C
         Performance counter stats for 'system wide':
      
                     12.62 Joules power/energy-cores/
               106,920,637        cycles
                80,228,899        instructions              #    0.75  insn per cycle
                     12.62 Joules power/energy-cores/
      
              14.514476987 seconds time elapsed
      
        [root@seventh ~]#
      
      But if we put compatible events in each group it works:
      
        [root@seventh ~]# perf stat -e '{power/energy-cores/,power/energy-ram/},{instructions,cycles}' -a sleep 2
      
         Performance counter stats for 'system wide':
      
                      1.95 Joules power/energy-cores/
                      0.92 Joules power/energy-ram/
                29,305,715        instructions              #    1.03  insn per cycle
                28,423,338        cycles
      
               2.001438142 seconds time elapsed
      
        [root@seventh ~]#
      
      This needs improvement tho:
      
        [root@seventh ~]# perf stat -e '{power/energy-cores/,power/energy-ram/},{instructions,cycles}' sleep 2
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (power/energy-cores/).
        /bin/dmesg | grep -i perf may provide additional information.
      
        [root@seventh ~]#
      
      We need to emit a better message, one stating that the power/ events
      can't be used for a specific workload, instead it is per-cpu or system
      wide.
      
      Fixes: 6a4bb04c ("perf tools: Enable grouping logic for parsed events")
      Co-developed-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NIan Rogers <irogers@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200602101736.GE1112120@kravaSigned-off-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9a17902
  11. 30 5月, 2020 1 次提交
    • S
      perf tools: Add optional support for libpfm4 · 70943490
      Stephane Eranian 提交于
      This patch links perf with the libpfm4 library if it is available and
      LIBPFM4 is passed to the build. The libpfm4 library contains hardware
      event tables for all processors supported by perf_events. It is a helper
      library that helps convert from a symbolic event name to the event
      encoding required by the underlying kernel interface. This library is
      open-source and available from: http://perfmon2.sf.net.
      
      With this patch, it is possible to specify full hardware events by name.
      Hardware filters are also supported. Events must be specified via the
      --pfm-events and not -e option. Both options are active at the same time
      and it is possible to mix and match:
      
        $ perf stat --pfm-events inst_retired:any_p:c=1:i -e cycles ....
      
      One needs to explicitely ask for its inclusion by using the LIBPFM4 make
      command line option, ie its opt-in rather than opt-out of feature
      detection and build support.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiwei Sun <jiwei.sun@windriver.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: http://lore.kernel.org/lkml/20200505182943.218248-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      70943490
  12. 28 5月, 2020 5 次提交
    • I
      perf metricgroup: Add options to not group or merge · 05530a79
      Ian Rogers 提交于
      Add --metric-no-group that causes all events within metrics to not be
      grouped. This can allow the event to get more time when multiplexed, but
      may also lower accuracy.
      Add --metric-no-merge option. By default events in different metrics may
      be shared if the group of events for one metric is the same or larger
      than that of the second. Sharing may increase or lower accuracy and so
      is now configurable.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200520182011.32236-7-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05530a79
    • P
      perf config: Add stat.big-num support · d778a778
      Paul A. Clarke 提交于
      Add support for new "stat.big-num" boolean option.
      
      This allows a user to set a default for "--no-big-num" for "perf stat"
      commands.
      
      --
        $ perf config stat.big-num
        $ perf stat --event cycles /bin/true
      
         Performance counter stats for '/bin/true':
      
                   778,849      cycles
        [...]
        $ perf config stat.big-num=false
        $ perf config stat.big-num
        stat.big-num=false
        $ perf stat --event cycles /bin/true
      
         Performance counter stats for '/bin/true':
      
                    769622      cycles
        [...]
      --
      
      There is an interaction with "--field-separator" that must be
      accommodated, such that specifying "--big-num --field-separator={x}"
      still reports an invalid combination of options.
      
      Documentation for perf-config and perf-stat updated.
      Signed-off-by: NPaul Clarke <pc@us.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lore.kernel.org/lkml/1589991815-17951-1-git-send-email-pc@us.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d778a778
    • J
      perf stat: Report summary for interval mode · c7e5b328
      Jin Yao 提交于
      Currently 'perf stat' supports to print counts at regular interval (-I),
      but it's not very easy for user to get the overall statistics.
      
      The patch uses 'evsel->prev_raw_counts' to get counts for summary.  Copy
      the counts to 'evsel->counts' after printing the interval results.
      Next, we just follow the non-interval processing.
      
      Let's see some examples,
      
       root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
       #           time             counts unit events
            1.000412064          2,281,114      cycles
            2.001383658          2,547,880      cycles
      
        Performance counter stats for 'system wide':
      
                4,828,994      cycles
      
              2.002860349 seconds time elapsed
      
       root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
       #           time             counts unit events
            1.000389902          1,536,093      cycles
            1.000389902            420,226      instructions              #    0.27  insn per cycle
            2.001433453          2,213,952      cycles
            2.001433453            735,465      instructions              #    0.33  insn per cycle
      
        Performance counter stats for 'system wide':
      
                3,750,045      cycles
                1,155,691      instructions              #    0.31  insn per cycle
      
              2.003023361 seconds time elapsed
      
       root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
       #           time             counts unit events
            1.000435121            905,303      inst_retired.any          #      2.9 CPI
            1.000435121          2,663,333      cycles
            1.000435121            914,702      inst_retired.any          #      0.3 IPC
            1.000435121          2,676,559      cpu_clk_unhalted.thread
            2.001615941          1,951,092      inst_retired.any          #      1.8 CPI
            2.001615941          3,551,357      cycles
            2.001615941          1,950,837      inst_retired.any          #      0.5 IPC
            2.001615941          3,551,044      cpu_clk_unhalted.thread
      
        Performance counter stats for 'system wide':
      
                2,856,395      inst_retired.any          #      2.2 CPI
                6,214,690      cycles
                2,865,539      inst_retired.any          #      0.5 IPC
                6,227,603      cpu_clk_unhalted.thread
      
              2.003403078 seconds time elapsed
      
      Committer testing:
      
      Before:
      
        # perf stat -e cycles -I1000 --interval-count 2
        #           time             counts unit events
             1.000618627         26,877,408      cycles
             2.001417968        233,672,829      cycles
        #
      
      After:
      
        # perf stat -e cycles -I1000 --interval-count 2
        #           time             counts unit events
             1.001531815      5,341,388,792      cycles
             2.002936530        100,073,912      cycles
      
         Performance counter stats for 'system wide':
      
             5,441,462,704      cycles
      
               2.004893794 seconds time elapsed
      
        #
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c7e5b328
    • J
      perf stat: Fix wrong per-thread runtime stat for interval mode · 72f02a94
      Jin Yao 提交于
        root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
             1.004171683             perf-3696              8,747,311      cycles
                ...
             1.004171683             perf-3696                691,730      instructions              #    0.08  insn per cycle
                ...
             2.006490373             perf-3696              1,749,936      cycles
                ...
             2.006490373             perf-3696              1,484,582      instructions              #    0.28  insn per cycle
                ...
      
      Let's see interval 2.006490373
      
        perf-3696              1,749,936      cycles
        perf-3696              1,484,582      instructions              #    0.28  insn per cycle
      
      insn per cycle = 1,484,582 / 1,749,936 = 0.85.
      
      But now it's 0.28, that's not correct.
      
      stat_config.stats[] records the per-thread runtime stat. But for
      interval mode, it should be reset for each interval.
      
      So now, with this patch,
      
        root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
             1.005818121             perf-8633              9,898,045      cycles
                ...
             1.005818121             perf-8633                693,298      instructions              #    0.07  insn per cycle
                ...
             2.007863743             perf-8633              1,551,619      cycles
                ...
             2.007863743             perf-8633              1,317,514      instructions              #    0.85  insn per cycle
                ...
      
      Let's check interval 2.007863743.
      
      insn per cycle = 1,317,514 / 1,551,619 = 0.85. It's correct.
      
      This patch creates runtime_stat_reset, places it next to
      untime_stat_new/runtime_stat_delete and moves all runtime_stat
      functions before process_interval.
      
      Committer testing:
      
      After the patch:
      
        # perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2  |& grep sssd_nss-1130
           2.011309774  sssd_nss-1130   56,585  cycles
           2.011309774  sssd_nss-1130   13,121  instructions  # 0.23 insn per cycle
        # python
        >>> 13121.0 / 56585
        0.23188124061146947
        >>>
      
      Fixes: commit 14e72a21 ("perf stat: Update or print per-thread stats")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200520042737.24160-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      72f02a94
    • J
      perf stat: Fix duration_time value for higher intervals · ea9eb1f4
      Jiri Olsa 提交于
      Joakim reported wrong duration_time value for interval bigger
      than 4000 [1].
      
      The problem is in the interval value we pass to update_stats
      function, which is typed as 'unsigned int' and overflows when
      we get over 2^32 (happens between intervals 4000 and 5000).
      
      Retyping the passed value to unsigned long long.
      
      [1] https://www.spinics.net/lists/linux-perf-users/msg11777.html
      
      Fixes: b90f1333 ("perf stat: Update walltime_nsecs_stats in interval mode")
      Reported-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200518131445.3745083-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea9eb1f4
  13. 06 5月, 2020 6 次提交
  14. 23 4月, 2020 1 次提交
    • J
      perf stat: Improve runtime stat for interval mode · 197ba86f
      Jin Yao 提交于
      For interval mode, the metric is printed after the '#' character if it
      exists. But it's not calculated by the counts generated in this
      interval.
      
      See the following examples:
      
        root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
        #           time             counts unit events
             1.000422803            764,809      inst_retired.any          #      2.9 CPI
             1.000422803          2,234,932      cycles
             2.001464585          1,960,061      inst_retired.any          #      1.6 CPI
             2.001464585          4,022,591      cycles
      
      The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1)
      
        root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
        #           time             counts unit events
             1.000429493          2,869,311      cycles
             1.000429493            816,875      instructions              #    0.28  insn per cycle
             2.001516426          9,260,973      cycles
             2.001516426          5,250,634      instructions              #    0.87  insn per cycle
      
      The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is
      0.57).
      
      The current code uses a global variable 'rt_stat' for tracking and
      updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not
      reset for interval. While the counts are reset for interval.
      
        perf_stat_process_counter()
        {
                if (config->interval)
                        init_stats(ps->res_stats);
        }
      
      So for interval mode, the 'rt_stat' variable should be reset too.
      
      This patch resets 'rt_stat' before read_counters(), so the runtime stat
      is only calculated by the counts generated in this interval.
      
      With this patch:
      
        root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
        #           time             counts unit events
             1.000420924          2,408,818      inst_retired.any          #      2.1 CPI
             1.000420924          5,010,111      cycles
             2.001448579          2,798,407      inst_retired.any          #      1.6 CPI
             2.001448579          4,599,861      cycles
      
        root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
        #           time             counts unit events
             1.000428555          2,769,714      cycles
             1.000428555            774,462      instructions              #    0.28  insn per cycle
             2.001471562          3,595,904      cycles
             2.001471562          1,243,703      instructions              #    0.35  insn per cycle
      
      Now the second 'insn per cycle' and CPI are calculated by the counts
      generated in this interval.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-By: NKajol Jain <kjain@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200420145417.6864-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      197ba86f
  15. 16 4月, 2020 1 次提交
  16. 04 3月, 2020 1 次提交
    • J
      perf stat: Show percore counts in per CPU output · 1af62ce6
      Jin Yao 提交于
      We have supported the event modifier "percore" which sums up the event
      counts for all hardware threads in a core and show the counts per core.
      
      For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
      
        Performance counter stats for 'system wide':
      
       S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
       S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
       S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
       S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/
      
      This patch provides a new option "--percore-show-thread". It is used
      with event modifier "percore" together to sum up the event counts for
      all hardware threads in a core but show the counts per hardware thread.
      
      This is essentially a replacement for the any bit (which is gone in
      Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
      The original percore version was inconvenient to post process. This
      variant matches the output of the any bit.
      
      With this patch, for example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,453,061      cpu/event=cpu-cycles,percore/
       CPU1               1,823,921      cpu/event=cpu-cycles,percore/
       CPU2               1,383,166      cpu/event=cpu-cycles,percore/
       CPU3               1,102,652      cpu/event=cpu-cycles,percore/
       CPU4               2,453,061      cpu/event=cpu-cycles,percore/
       CPU5               1,823,921      cpu/event=cpu-cycles,percore/
       CPU6               1,383,166      cpu/event=cpu-cycles,percore/
       CPU7               1,102,652      cpu/event=cpu-cycles,percore/
      
      We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
      CPU2/CPU6, CPU3/CPU7).
      
      The interval mode also works. For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
       #           time CPU                    counts unit events
            1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
            1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/
      
      If we offline CPU5, the result is:
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,752,148      cpu/event=cpu-cycles,percore/
       CPU1               1,009,312      cpu/event=cpu-cycles,percore/
       CPU2               2,784,072      cpu/event=cpu-cycles,percore/
       CPU3               2,427,922      cpu/event=cpu-cycles,percore/
       CPU4               2,752,148      cpu/event=cpu-cycles,percore/
       CPU6               2,784,072      cpu/event=cpu-cycles,percore/
       CPU7               2,427,922      cpu/event=cpu-cycles,percore/
      
              1.001416041 seconds time elapsed
      
       v4:
       ---
       Ravi Bangoria reports an issue in v3. Once we offline a CPU,
       the output is not correct. The issue is we should use the cpu
       idx in print_percore_thread rather than using the cpu value.
      
       v3:
       ---
       1. Fix the interval mode output error
       2. Use cpu value (not cpu index) in config->aggr_get_id().
       3. Refine the code according to Jiri's comments.
      
       v2:
       ---
       Add the explanation in change log. This is essentially a replacement
       for the any bit. No code change.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1af62ce6
  17. 29 11月, 2019 2 次提交