1. 23 3月, 2017 1 次提交
    • A
      perf stat: Output JSON MetricExpr metric · 37932c18
      Andi Kleen 提交于
      Add generic infrastructure to perf stat to output ratios for
      "MetricExpr" entries in the event lists. Many events are more useful as
      ratios than in raw form, typically some count in relation to total
      ticks.
      
      Transfer the MetricExpr information from the alias to the evsel.
      
      We mark the events that need to be collected for MetricExpr, and also
      link the events using them with a pointer. The code is careful to always
      prefer the right event in the same group to minimize multiplexing
      errors. At the moment only a single relation is supported.
      
      Then add a rblist to the stat shadow code that remembers stats based on
      the cpu and context.
      
      Then finally update and retrieve and print these values similarly to the
      existing hardcoded perf metrics. We use the simple expression parser
      added earlier to evaluate the expression.
      
      Normally we just output the result without further commentary, but for
      --metric-only this would lead to empty columns. So for this case use the
      original event as description.
      
      There is no attempt to automatically add the MetricExpr event, if it is
      missing, however we suggest it to the user, because the user tool
      doesn't have enough information to reliably construct a group that is
      guaranteed to schedule. So we leave that to the user.
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
             1.000147889        800,085,181      unc_p_clockticks
             1.000147889         93,126,241      unc_p_freq_max_os_cycles  #     11.6
             2.000448381        800,218,217      unc_p_clockticks
             2.000448381        142,516,095      unc_p_freq_max_os_cycles  #     17.8
             3.000639852        800,243,057      unc_p_clockticks
             3.000639852        162,292,689      unc_p_freq_max_os_cycles  #     20.3
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
        #    time         freq_max_os_cycles %
             1.000127077      0.9
             2.000301436      0.7
             3.000456379      0.0
      
      v2: Change from DivideBy to MetricExpr
      v3: Use expr__ prefix.  Support more than one other event.
      v4: Update description
      v5: Only print warning message once for multiple PMUs.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      37932c18
  2. 22 3月, 2017 3 次提交
  3. 04 3月, 2017 2 次提交
    • J
      perf tools: Force uncore events to system wide monitoring · e3ba76de
      Jiri Olsa 提交于
      Make system wide (-a) the default option if no target was specified and
      one of following conditions is met:
      
        - there's no workload specified (current behaviour)
        - there is workload specified but all requested
          events are system wide ones
      
      Mixed events core/uncore with workload:
      
        $ perf stat -e 'uncore_cbox_0/clockticks/,cycles' sleep 1
      
         Performance counter stats for 'sleep 1':
      
           <not supported>      uncore_cbox_0/clockticks/
                   980,489      cycles
      
               1.000897406 seconds time elapsed
      
      Uncore event with workload:
      
        $ perf stat -e 'uncore_cbox_0/clockticks/' sleep 1
      
         Performance counter stats for 'system wide':
      
        281,473,897,192,670      uncore_cbox_0/clockticks/
      
               1.000833784 seconds time elapsed
      
      Committer note:
      
      When testing I realized the default case for !root, i.e. no events
      passed via -e, was broke by v2 of this patch, reported and after a
      patch provided by Jiri it is back working:
      
        [acme@jouet linux]$ perf stat usleep 1
      
         Performance counter stats for 'usleep 1':
      
               0.401335      task-clock:u (msec)     #   0.297 CPUs utilized
                      0      context-switches:u      #   0.000 K/sec
                      0      cpu-migrations:u        #   0.000 K/sec
                     48      page-faults:u           #   0.120 M/sec
                458,146      cycles:u                #   1.142 GHz
                245,113      instructions:u          #   0.54  insn per cycle
                 47,991      branches:u              # 119.578 M/sec
                  4,022      branch-misses:u         #   8.38% of all branches
      
            0.001350029 seconds time elapsed
      
        [acme@jouet linux]$
      Suggested-and-Tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170227094818.GA12764@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e3ba76de
    • B
      perf stat: Issue a HW watchdog disable hint · 02d492e5
      Borislav Petkov 提交于
      When using perf stat on an AMD F15h system with the default hw events
      attributes, some of the events don't get counted:
      
       Performance counter stats for 'sleep 1':
      
                0.749208      task-clock (msec)         #    0.001 CPUs utilized
                       1      context-switches          #    0.001 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      54      page-faults               #    0.072 M/sec
               1,122,815      cycles                    #    1.499 GHz
                 286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle
           <not counted>      stalled-cycles-backend                                        (0.00%)
           ^^^^^^^^^^^^
           <not counted>      instructions                                                  (0.00%)
           ^^^^^^^^^^^^
           <not counted>      branches                                                      (0.00%)
           <not counted>      branch-misses                                                 (0.00%)
      
             1.001550070 seconds time elapsed
      
      The reason is that we have the HW watchdog consuming one PMU counter and
      when perf tries to schedule 6 events on 6 counters and some of those
      counters are constrained to only a specific subset of PMCs by the
      hardware, the event scheduling fails.
      
      So issue a hint to disable the HW watchdog around a perf stat session.
      
      Committer note:
      
      Testing it...
      
        # perf stat -d usleep 1
      
         Performance counter stats for 'usleep 1':
      
                1.180203      task-clock (msec)         #    0.490 CPUs utilized
                       1      context-switches          #    0.847 K/sec
                       0      cpu-migrations            #    0.000 K/sec
                      54      page-faults               #    0.046 M/sec
                 184,754      cycles                    #    0.157 GHz
                 714,553      instructions              #    3.87  insn per cycle
                 154,661      branches                  #  131.046 M/sec
                   7,247      branch-misses             #    4.69% of all branches
                 219,984      L1-dcache-loads           #  186.395 M/sec
                  17,600      L1-dcache-load-misses     #    8.00% of all L1-dcache hits    (90.16%)
           <not counted>      LLC-loads                                                     (0.00%)
           <not counted>      LLC-load-misses                                               (0.00%)
      
             0.002406823 seconds time elapsed
      
        Some events weren't counted. Try disabling the NMI watchdog:
      	echo 0 > /proc/sys/kernel/nmi_watchdog
      	perf stat ...
      	echo 1 > /proc/sys/kernel/nmi_watchdog
        #
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Vince Weaver <vince@deater.net>
      Link: http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnicSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      02d492e5
  4. 20 2月, 2017 1 次提交
  5. 18 2月, 2017 1 次提交
    • J
      perf stat: Add -a as default target · 0d79f8b9
      Jiri Olsa 提交于
      Boris asked for default -a option in case we monitor only uncore events.
      
      While implementing that I thought it might be actually useful to make it
      overall default.
      
      Running 'perf stat' will now collect system wide data.
      
      Committer note:
      
      Testing it:
      
        # perf stat
        ^C
         Performance counter stats for 'system wide':
      
               3571.559178      cpu-clock (msec)          #    4.000 CPUs utilized
                     3,346      context-switches          #    0.937 K/sec
                       277      cpu-migrations            #    0.078 K/sec
                    57,271      page-faults               #    0.016 M/sec
             4,535,633,835      cycles                    #    1.270 GHz
             6,389,736,516      instructions              #    1.41  insn per cycle
             1,541,293,875      branches                  #  431.547 M/sec
                14,526,396      branch-misses             #    0.94% of all branches
      
               0.892950118 seconds time elapsed
      
        #
      Requested-and-Acked-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170217170034.GB15389@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d79f8b9
  6. 17 2月, 2017 1 次提交
    • J
      perf tools: Replace _SC_NPROCESSORS_CONF with max_present_cpu in cpu_topology_map · da8a58b5
      Jan Stancek 提交于
      There are 2 problems wrt. cpu_topology_map on systems with sparse CPUs:
      
      1. offline/absent CPUs will have their socket_id and core_id set to -1
         which triggers:
         "socket_id number is too big.You may need to upgrade the perf tool."
      
      2. size of cpu_topology_map (perf_env.cpu[]) is allocated based on
         _SC_NPROCESSORS_CONF, but can be indexed with CPU ids going above.
         Users of perf_env.cpu[] are using CPU id as index. This can lead
         to read beyond what was allocated:
         ==19991== Invalid read of size 4
         ==19991==    at 0x490CEB: check_cpu_topology (topology.c:69)
         ==19991==    by 0x490CEB: test_session_topology (topology.c:106)
         ...
      
      For example:
        _SC_NPROCESSORS_CONF == 16
        available: 2 nodes (0-1)
        node 0 cpus: 0 6 8 10 16 22 24 26
        node 0 size: 12004 MB
        node 0 free: 9470 MB
        node 1 cpus: 1 7 9 11 23 25 27
        node 1 size: 12093 MB
        node 1 free: 9406 MB
        node distances:
        node   0   1
          0:  10  20
          1:  20  10
      
      This patch changes HEADER_NRCPUS.nr_cpus_available from _SC_NPROCESSORS_CONF
      to max_present_cpu and updates any user of cpu_topology_map to iterate
      with nr_cpus_avail.
      
      As a consequence HEADER_CPU_TOPOLOGY core_id and socket_id lists get longer,
      but maintain compatibility with pre-patch state - index to cpu_topology_map is
      CPU id.
      
        perf test 36 -v
        36: Session topology                           :
        --- start ---
        test child forked, pid 22211
        templ file: /tmp/perf-test-gmdX5i
        CPU 0, core 0, socket 0
        CPU 1, core 0, socket 1
        CPU 6, core 10, socket 0
        CPU 7, core 10, socket 1
        CPU 8, core 1, socket 0
        CPU 9, core 1, socket 1
        CPU 10, core 9, socket 0
        CPU 11, core 9, socket 1
        CPU 16, core 0, socket 0
        CPU 22, core 10, socket 0
        CPU 23, core 10, socket 1
        CPU 24, core 1, socket 0
        CPU 25, core 1, socket 1
        CPU 26, core 9, socket 0
        CPU 27, core 9, socket 1
        test child finished with 0
        ---- end ----
        Session topology: Ok
      Signed-off-by: NJan Stancek <jstancek@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/d7c05c6445fca74a8442c2c73cfffd349c52c44f.1487146877.git.jstancek@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      da8a58b5
  7. 14 2月, 2017 1 次提交
  8. 16 12月, 2016 1 次提交
  9. 23 9月, 2016 1 次提交
  10. 24 8月, 2016 2 次提交
  11. 09 8月, 2016 1 次提交
    • M
      perf stat: Avoid skew when reading events · 3df33eff
      Mark Rutland 提交于
      When we don't have a tracee (i.e. we're attaching to a task or CPU),
      counters can still be running after our workload finishes, and can still
      be running as we read their values. As we read events one-by-one, there
      can be arbitrary skew between values of events, even within a group.
      This means that ratios within an event group are not reliable.
      
      This skew can be seen if measuring a group of identical events, e.g:
      
        # perf stat -a -C0 -e '{cycles,cycles}' sleep 1
      
      To avoid this, we must stop groups from counting before we read the
      values of any constituent events. This patch adds and makes use of a new
      disable_counters() helper, which disables group leaders (and thus each
      group as a whole). This mirrors the use of enable_counters() for
      starting event groups in the absence of a tracee.
      
      Closing a group leader splits the group, and without a disabled group
      leader the newly split events will begin counting. Thus to ensure counts
      are reliable we must defer closing group leaders until all counts have
      been read. To do so this patch removes the event closing logic from the
      read_counters() helper, explicitly closes the events using
      perf_evlist__close(), which also aids legibility.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1470747869-3567-1-git-send-email-mark.rutland@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3df33eff
  12. 19 7月, 2016 1 次提交
    • M
      perf stat: Balance opening and reading events · 00e727bb
      Mark Rutland 提交于
      In create_perf_stat_counter, when a target CPU has not been provided, we
      call __perf_evsel__open with empty_cpu_map, and open a single FD per
      thread. However, in read_counter we assume that we opened events for the
      product of threads and CPUs described in the evsel's cpu_map.
      
      Thus, if an evsel has a cpu_map with more than one entry, we will
      attempt to access FDs that we didn't open. This could result in a number
      of problems (e.g. blocking while reading from STDIN if the fd memory
      happened to be initialised to zero).
      
      This is problematic for systems were a logical CPU PMU covers some
      arbitrary subset of CPUs. The cpu_map of any evsel for that PMU will be
      initialised based on the cpumask exposed through sysfs, even if the user
      requests per-thread events.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1468577293-19667-2-git-send-email-mark.rutland@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      00e727bb
  13. 13 7月, 2016 1 次提交
    • A
      tools: Introduce str_error_r() · c8b5f2c9
      Arnaldo Carvalho de Melo 提交于
      The tools so far have been using the strerror_r() GNU variant, that
      returns a string, be it the buffer passed or something else.
      
      But that, besides being tricky in cases where we expect that the
      function using strerror_r() returns the error formatted in a provided
      buffer (we have to check if it returned something else and copy that
      instead), breaks the build on systems not using glibc, like Alpine
      Linux, where musl libc is used.
      
      So, introduce yet another wrapper, str_error_r(), that has the GNU
      interface, but uses the portable XSI variant of strerror_r(), so that
      users rest asured that the provided buffer is used and it is what is
      returned.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c8b5f2c9
  14. 23 6月, 2016 1 次提交
  15. 07 6月, 2016 3 次提交
    • A
      perf stat: Add missing aggregation headers for --metric-only CSV · c51fd639
      Andi Kleen 提交于
      When in CSV mode --metric-only outputs an header, unlike the other
      modes. Previously it did not properly print headers for the aggregation
      columns, so the headers were actually shifted against the real values.
      
      Fix this here by outputting the correct headers for CSV.
      
      v2: Indent array.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1464119559-17203-4-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c51fd639
    • A
      perf stat: Print topology/time headers with --metric-only · 41c8ca2a
      Andi Kleen 提交于
      When --metric-only is enabled there were no headers for the topology in
      interval mode.  Also when headers were printed they were on a separate
      line.
      
      Before:
      
        $ perf stat  --metric-only  -A -I 1000 -a
          1.001038376     frontend cycles idle insn per cycle  stalled cycles per insn branch-misses of all branches
          1.001038376 CPU0   123.54%               0.23           5.29                    7.61%
          1.001038376 CPU1   137.78%               0.24           5.13                   10.07%
          1.001038376 CPU2    64.48%               0.22           5.50                    6.84%
      
      After:
      
        $ perf stat  --metric-only  -A -I 1000 -a
          1.001111114 CPU0    82.46%               0.32           2.60                    7.64%
          1.001111114 CPU1   126.63%               0.02          42.83                    0.15%
          1.001111114 CPU2   193.54%               0.32           2.59                    6.92%
      
      v2: Move all headers on a single line
      Reported-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1464119559-17203-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      41c8ca2a
    • A
      perf stat: Basic support for TopDown in perf stat · 44b1e60a
      Andi Kleen 提交于
      Add basic plumbing for TopDown in perf stat
      
      TopDown is intended to replace the frontend cycles idle/ backend cycles
      idle metrics in standard perf stat output.  These metrics are not
      reliable in many workloads, due to out of order effects.
      
      This implements a new --topdown mode in perf stat (similar to
      --transaction) that measures the pipe line bottlenecks using
      standardized formulas. The measurement can be all done with 5 counters
      (one fixed counter)
      
      The result are four metrics:
      
      FrontendBound, BackendBound, BadSpeculation, Retiring
      
      that describe the CPU pipeline behavior on a high level.
      
      The full top down methology has many hierarchical metrics.  This
      implementation only supports level 1 which can be collected without
      multiplexing. A full implementation of top down on top of perf is
      available in pmu-tools toplev.  (http://github.com/andikleen/pmu-tools)
      
      The current version works on Intel Core CPUs starting with Sandy Bridge,
      and Atom CPUs starting with Silvermont.  In principle the generic
      metrics should be also implementable on other out of order CPUs.
      
      TopDown level 1 uses a set of abstracted metrics which are generic to
      out of order CPU cores (although some CPUs may not implement all of
      them):
      
        topdown-total-slots       Available slots in the pipeline
        topdown-slots-issued      Slots issued into the pipeline
        topdown-slots-retired     Slots successfully retired
        topdown-fetch-bubbles     Pipeline gaps in the frontend
        topdown-recovery-bubbles  Pipeline gaps during recovery
                                  from misspeculation
      
      These metrics then allow to compute four useful metrics:
      
      FrontendBound, BackendBound, Retiring, BadSpeculation.
      
      Add a new --topdown options to enable events.  When --topdown is
      specified set up events for all topdown events supported by the kernel.
      Add topdown-* as a special case to the event parser, as is needed for
      all events containing -.
      
      The actual code to compute the metrics is in follow-on patches.
      
      v2: Use standard sysctl read function.
      v3: Move x86 specific code to arch/
      v4: Enable --metric-only implicitly for topdown.
      v5: Add --single-thread option to not force per core mode
      v6: Fix output order of topdown metrics
      v7: Allow combining with -d
      v8: Remove --single-thread again
      v9: Rename functions, adding arch_ and topdown_.
      v10: Expand man page and describe TopDown better
      Paste intro into commit description.
      Print error when malloc fails.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44b1e60a
  16. 17 5月, 2016 2 次提交
  17. 13 5月, 2016 1 次提交
    • A
      perf stat: Fallback to user only counters when perf_event_paranoid > 1 · 42ef8a78
      Arnaldo Carvalho de Melo 提交于
      After 0161028b ("perf/core: Change the default paranoia level to 2")
      'perf stat' fails for users without CAP_SYS_ADMIN, so just use
      'perf_evsel__fallback()' to have the same behaviour as 'perf record',
      i.e. set perf_event_attr.exclude_kernel to 1.
      
      Now:
      
        [acme@jouet linux]$ perf stat usleep 1
      
         Performance counter stats for 'usleep 1':
      
                0.352536      task-clock:u (msec)  #   0.423 CPUs utilized
                       0      context-switches:u   #   0.000 K/sec
                       0      cpu-migrations:u     #   0.000 K/sec
                      49      page-faults:u        #   0.139 M/sec
                 309,407      cycles:u             #   0.878 GHz
                 243,791      instructions:u       #   0.79  insn per cycle
                  49,622      branches:u           # 140.757 M/sec
                   3,884      branch-misses:u      #   7.83% of all branches
      
             0.000834174 seconds time elapsed
      
        [acme@jouet linux]$
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42ef8a78
  18. 07 5月, 2016 1 次提交
  19. 11 3月, 2016 2 次提交
  20. 03 3月, 2016 4 次提交
    • A
      perf stat: Check for frontend stalled for metrics · fb4605ba
      Andi Kleen 提交于
      Add an extra check for frontend stalled in the metrics.  This avoids an
      extra column for the --metric-only case when the CPU does not support
      frontend stalled.
      
      v2: Add separate init function
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1456858672-21594-8-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fb4605ba
    • A
      perf stat: Support metrics in --per-core/socket mode · 44d49a60
      Andi Kleen 提交于
      Enable metrics printing in --per-core / --per-socket mode. We need to
      save the shadow metrics in a unique place. Always use the first CPU in
      the aggregation. Then use the same CPU to retrieve the shadow value
      later.
      
      Example output:
      
        % perf stat --per-core -a ./BC1s
      
         Performance counter stats for 'system wide':
      
        S0-C0 2   2966.020381 task-clock (msec) #   2.004 CPUs utilized  (100.00%)
        S0-C0 2            49 context-switches  #   0.017 K/sec          (100.00%)
        S0-C0 2             4 cpu-migrations    #   0.001 K/sec          (100.00%)
        S0-C0 2           467 page-faults       #   0.157 K/sec
        S0-C0 2 4,599,061,773 cycles            #   1.551 GHz            (100.00%)
        S0-C0 2 9,755,886,883 instructions      #   2.12  insn per cycle (100.00%)
        S0-C0 2 1,906,272,125 branches          # 642.704 M/sec          (100.00%)
        S0-C0 2    81,180,867 branch-misses     #   4.26% of all branches
        S0-C1 2   2965.995373 task-clock (msec) #   2.003 CPUs utilized  (100.00%)
        S0-C1 2            62 context-switches  #   0.021 K/sec          (100.00%)
        S0-C1 2             8 cpu-migrations    #   0.003 K/sec          (100.00%)
        S0-C1 2           281 page-faults       #   0.095 K/sec
        S0-C1 2     6,347,290 cycles            #   0.002 GHz            (100.00%)
        S0-C1 2     4,654,156 instructions      #   0.73  insn per cycle (100.00%)
        S0-C1 2       947,121 branches          #   0.319 M/sec          (100.00%)
        S0-C1 2        37,322 branch-misses     #   3.94% of all branches
      
               1.480409747 seconds time elapsed
      
      v2: Rebase to older patches
      v3: Document shadow cpus. Fix aggr_get_id argument. Fix -A shadows (Jiri)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1456785386-19481-4-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44d49a60
    • A
      perf stat: Implement CSV metrics output · 92a61f64
      Andi Kleen 提交于
      Now support CSV output for metrics. With the new output callbacks this
      is relatively straight forward by creating new callbacks.
      
      This allows to easily plot metrics from CSV files.
      
      The new line callback needs to know the number of fields to skip them
      correctly
      
      Example output before:
      
        % perf stat -x, true
        0.200687,,task-clock,200687,100.00
        0,,context-switches,200687,100.00
        0,,cpu-migrations,200687,100.00
        40,,page-faults,200687,100.00
        730871,,cycles,203601,100.00
        551056,,stalled-cycles-frontend,203601,100.00
        <not supported>,,stalled-cycles-backend,0,100.00
        385523,,instructions,203601,100.00
        78028,,branches,203601,100.00
        3946,,branch-misses,203601,100.00
      
      After:
      
        % perf stat -x, true
        .502457,,task-clock,502457,100.00,0.485,CPUs utilized
        0,,context-switches,502457,100.00,0.000,K/sec
        0,,cpu-migrations,502457,100.00,0.000,K/sec
        45,,page-faults,502457,100.00,0.090,M/sec
        644692,,cycles,509102,100.00,1.283,GHz
        423470,,stalled-cycles-frontend,509102,100.00,65.69,frontend cycles idle
        <not supported>,,stalled-cycles-backend,0,100.00,,,,
        492701,,instructions,509102,100.00,0.76,insn per cycle
        ,,,,,0.86,stalled cycles per insn
        97767,,branches,509102,100.00,194.578,M/sec
        4788,,branch-misses,509102,100.00,4.90,of all branches
      
      or easier readable
      
        $ perf stat  -x, -o x.csv true
        $ column -s, -t x.csv
        0.490635        task-clock              490635 100.00 0.489   CPUs utilized
        0               context-switches        490635 100.00 0.000   K/sec
        0               cpu-migrations          490635 100.00 0.000   K/sec
        45              page-faults             490635 100.00 0.092   M/sec
        629080          cycles                  497698 100.00 1.282   GHz
        409498          stalled-cycles-frontend 497698 100.00 65.09   frontend cycles idle
        <not supported> stalled-cycles-backend  0      100.00
        491424          instructions            497698 100.00 0.78    insn per cycle
                                                              0.83    stalled cycles per insn
        97278           branches                497698 100.00 198.270 M/sec
        4569            branch-misses           497698 100.00 4.70    of all branches
      
      Two new fields are added: metric value and metric name.
      
      v2: Split out function argument changes
      v3: Reenable metrics for real.
      v4: Fix wrong hunk from refactoring.
      v5: Remove extra "noise" printing (Jiri), but add it to the not counted case.
      Print empty metrics for not counted.
      v6: Avoid outputting metric on empty format.
      v7: Print metric at the end
      v8: Remove extra run, ena fields
      v9: Avoid extra new line for unsupported counters
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/1456785386-19481-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92a61f64
    • A
      perf stat: Check existence of frontend/backed stalled cycles · 9dec4473
      Andi Kleen 提交于
      Only put the frontend/backend stalled cycles into the default perf stat
      events when the CPU actually supports them.
      
      This avoids empty columns with --metric-only on newer Intel CPUs.
      
      Committer note:
      
      Before:
      
        $ perf stat ls
      
          Performance counter stats for 'ls':
      
                1.080893     task-clock (msec)      #    0.619 CPUs utilized
                       0     context-switches       #    0.000 K/sec
                       0     cpu-migrations         #    0.000 K/sec
                      97     page-faults            #    0.090 M/sec
               3,327,741     cycles                 #    3.079 GHz
         <not supported>     stalled-cycles-frontend
         <not supported>     stalled-cycles-backend
               1,609,544     instructions           #    0.48  insn per cycle
                 319,117     branches               #  295.235 M/sec
                  12,246     branch-misses          #    3.84% of all branches
      
             0.001746508 seconds time elapsed
        $
      
      After:
      
        $ perf stat ls
      
          Performance counter stats for 'ls':
      
                0.693948     task-clock (msec)      #    0.662 CPUs utilized
                       0     context-switches       #    0.000 K/sec
                       0     cpu-migrations         #    0.000 K/sec
                      95     page-faults            #    0.137 M/sec
               1,792,509     cycles                 #    2.583 GHz
               1,599,047     instructions           #    0.89  insn per cycle
                 316,328     branches               #  455.838 M/sec
                  12,453     branch-misses          #    3.94% of all branches
      
             0.001048987 seconds time elapsed
        $
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1456532881-26621-2-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9dec4473
  21. 20 2月, 2016 2 次提交
    • W
      perf stat: Bail out on unsupported event config modifiers · 1669e509
      Wang Nan 提交于
      'perf stat' accepts some config terms but doesn't apply them. For
      example:
      
        # perf stat -e 'instructions/no-inherit/' -e 'instructions/inherit/' bash
        # ls
        # exit
      
        Performance counter stats for 'bash':
      
               266258061      instructions/no-inherit/
               266258061      instructions/inherit/
      
             1.402183915 seconds time elapsed
      
      The result is confusing, because user may expect the first
      'instructions' event exclude the 'ls' command.
      
      This patch forbid most of these config terms for 'perf stat'.
      
      Result:
      
        # ./perf stat -e 'instructions/no-inherit/' -e 'instructions/inherit/' bash
        event syntax error: 'instructions/no-inherit/'
                             \___ 'no-inherit' is not usable in 'perf stat'
        ...
      
      We can add blocked config terms back when 'perf stat' really supports them.
      
      This patch also removes unavailable config term from error message:
      
        # ./perf stat -e 'instructions/badterm/' ls
        event syntax error: 'instructions/badterm/'
                                          \___ unknown term
      
        valid terms: config,config1,config2,name
      
        # ./perf stat -e 'cpu/badterm/' ls
        event syntax error: 'cpu/badterm/'
                                 \___ unknown term
      
        valid terms: pc,any,inv,edge,cmask,event,in_tx,ldlat,umask,in_tx_cp,offcore_rsp,config,config1,config2,name
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1455882283-79592-11-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1669e509
    • A
      perf stat: Handled scaled == -1 case for counters · b002f3bb
      Andi Kleen 提交于
      Arnaldo pointed out that the earlier cb110f47 ("perf stat: Move
      noise/running printing into printout") change changed behavior for not
      counted counters. This patch fixes it again.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Fixes: cb110f47 ("perf stat: Move noise/running printing into printout")
      Link: http://lkml.kernel.org/r/1455749045-18098-2-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b002f3bb
  22. 17 2月, 2016 3 次提交
  23. 12 1月, 2016 1 次提交
  24. 07 1月, 2016 1 次提交
  25. 18 12月, 2015 2 次提交