1. 08 3月, 2018 1 次提交
  2. 22 2月, 2018 1 次提交
  3. 16 2月, 2018 2 次提交
    • Y
      perf stat: Add support to print counts after a period of time · f1f8ad52
      yuzhoujian 提交于
      Introduce a new option to print counts after N milliseconds and update
      'perf stat' documentation accordingly.
      
      Show below is the output of the new option for perf stat.
      
        $ perf stat --time 2000 -e cycles -a
        Performance counter stats for 'system wide':
      
              157,260,423      cycles
      
              2.003060766 seconds time elapsed
      
      We can print the count deltas after N milliseconds with this new
      introduced option. This option is not supported with "-I" option.
      
      In addition, according to Kangliang's patch(19afd104), the
      monitoring overhead for system-wide core event could be very high if the
      interval-print parameter was below 100ms, and the limitation value is
      10ms.
      
      So the same warning will be displayed when the time is set between 10ms
      to 100ms, and the minimal time is limited to 10ms. Users can make a
      decision according to their spcific cases.
      
      Committer notes:
      
      This actually stops the workload after the specified time, then prints
      the counts.
      
      So I renamed the option to --timeout and updated the documentation to
      state that it will not just print the counts after the specified time,
      but will really stop the 'perf stat' session and print the counts.
      
      The rename from 'time' to 'timeout' also fixes the build in systems
      where 'time' is used by glibc and can't be used as a name of a variable,
      such as centos:5 and centos:6.
      
      Changes since v3:
      - none.
      
      Changes since v2:
      - modify the time check in __run_perf_stat func to keep some consistency
        with the workload case.
      - add the warning when the time is set between 10ms to 100ms.
      - add the pr_err when the time is set below 10ms.
      
      Changes since v1:
      - none.
      Signed-off-by: Nyuzhoujian <yuzhoujian@didichuxing.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1517217923-8302-3-git-send-email-ufo19890607@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f1f8ad52
    • Y
      perf stat: Add support to print counts for fixed times · db06a269
      yuzhoujian 提交于
      Introduce a new option to print counts for fixed number of times and
      update 'perf stat' documentation accordingly.
      
      Show below is the output of the new option for perf stat.
      
        $ perf stat -I 1000 --interval-count 2 -e cycles -a
        #           time             counts unit events
                 1.002827089         93,884,870      cycles
                 2.004231506         56,573,446      cycles
      
      We can just print the counts for several times with this newly
      introduced option. The usage of it is a little like 'vmstat', and it
      should be used together with "-I" option.
      
        $ vmstat -n 1 2
        procs ---------memory-------------- --swap- ----io-- -system-- ------cpu---
         r  b swpd   free   buff   cache    si   so  bi   bo  in   cs us sy id wa st
         0  0    0 78270544 547484 51732076  0   0   0   20    1    1  1  0 99  0 0
         0  0    0 78270512 547484 51732080  0   0   0   16  477 1555  0  0 100 0 0
      
      Changes since v3:
      - merge interval_count check and times check to one line.
      - fix the wrong indent in stat.h
      - use stat_config.times instead of 'times' in cmd_stat function.
      
      Changes since v2:
      - none.
      
      Changes since v1:
      - change the name of the new option "times-print" to "interval-count".
      - keep the new option interval specifically.
      Signed-off-by: Nyuzhoujian <yuzhoujian@didichuxing.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1517217923-8302-2-git-send-email-ufo19890607@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db06a269
  4. 13 9月, 2017 1 次提交
    • A
      perf stat: Support JSON metrics in perf stat · b18f3e36
      Andi Kleen 提交于
      Add generic support for standalone metrics specified in JSON files to
      perf stat. A metric is a formula that uses multiple events to compute a
      higher level result (e.g. IPC).
      
      Previously metrics were always tied to an event and automatically
      enabled with that event. But now change it that we can have standalone
      metrics. They are in the same JSON data structure as events, but don't
      have an event name.
      
      We also allow to organize the metrics in metric groups, which allows a
      short cut to select several related metrics at once.
      
      Add a new -M / --metrics option to perf stat that adds the metrics or
      metric groups specified.
      
      Add the core code to manage and parse the metric groups. They are
      collected from the JSON data structures into a separate rblist.  When
      computing shadow values look for metrics in that list.  Then they are
      computed using the existing saved values infrastructure in stat-shadow.c
      
      The actual JSON metrics are in a separate pull request.
      
        % perf stat -M Summary --metric-only -a sleep 1
      
         Performance counter stats for 'system wide':
      
        Instructions   CLKS          CPU_Utilization  GFLOPs   SMT_2T_Utilization   Kernel_Utilization
        317614222.0    1392930775.0  0.0              0.0      0.2                  0.1
      
             1.001497549 seconds time elapsed
      
        % perf stat -M GFLOPs flops
      
         Performance counter stats for 'flops':
      
           3,999,541,471  fp_comp_ops_exe.sse_scalar_single #  1.2 GFLOPs   (66.65%)
                      14  fp_comp_ops_exe.sse_scalar_double                 (66.65%)
                       0  fp_comp_ops_exe.sse_packed_double                 (66.67%)
                       0  fp_comp_ops_exe.sse_packed_single                 (66.70%)
                       0  simd_fp_256.packed_double                         (66.70%)
                       0  simd_fp_256.packed_single                         (66.67%)
                       0  duration_time
      
             3.238372845 seconds time elapsed
      
      v2: Add missing header file
      v3: Move find_map to pmu.c
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b18f3e36
  5. 28 8月, 2017 1 次提交
  6. 21 6月, 2017 1 次提交
    • K
      perf stat: Add support to measure SMI cost · daefd0bc
      Kan Liang 提交于
      Implementing a new --smi-cost mode in perf stat to measure SMI cost.
      
      During the measurement, the /sys/device/cpu/freeze_on_smi will be set.
      
      The measurement can be done with one counter (unhalted core cycles), and
      two free running MSR counters (IA32_APERF and SMI_COUNT).
      
      In practice, the percentages of SMI core cycles should be more useful
      than absolute value. So the output will be the percentage of SMI core
      cycles and SMI#. metric_only will be set by default.
      
      SMI cycles% = (aperf - unhalted core cycles) / aperf
      
      Here is an example output.
      
       Performance counter stats for 'sudo echo ':
      
      SMI cycles%          SMI#
          0.1%              1
      
             0.010858678 seconds time elapsed
      
      Users who wants to get the actual value can apply additional
      --no-metric-only.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      daefd0bc
  7. 22 3月, 2017 1 次提交
    • A
      perf stat: Collapse identically named events · 430daf2d
      Andi Kleen 提交于
      The uncore PMU has a lot of duplicated PMUs for different subsystems.
      When expanding an uncore alias we usually end up with a large
      number of identically named aliases, which makes perf stat
      output difficult to read.
      
      Automatically sum them up in perf stat, unless --no-merge is specified.
      
      This can be default because only the uncores generally have duplicated
      aliases. Other PMUs have unique names.
      
      Before:
      
        % perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1
      
        Performance counter stats for 'system wide':
      
                 694,976 Bytes unc_c_llc_lookup.any
                 706,304 Bytes unc_c_llc_lookup.any
                 956,608 Bytes unc_c_llc_lookup.any
                 782,720 Bytes unc_c_llc_lookup.any
                 605,696 Bytes unc_c_llc_lookup.any
                 442,816 Bytes unc_c_llc_lookup.any
                 659,328 Bytes unc_c_llc_lookup.any
                 509,312 Bytes unc_c_llc_lookup.any
                 263,936 Bytes unc_c_llc_lookup.any
                 592,448 Bytes unc_c_llc_lookup.any
                 672,448 Bytes unc_c_llc_lookup.any
                 608,640 Bytes unc_c_llc_lookup.any
                 641,024 Bytes unc_c_llc_lookup.any
                 856,896 Bytes unc_c_llc_lookup.any
                 808,832 Bytes unc_c_llc_lookup.any
                 684,864 Bytes unc_c_llc_lookup.any
                 710,464 Bytes unc_c_llc_lookup.any
                 538,304 Bytes unc_c_llc_lookup.any
      
             1.002577660 seconds time elapsed
      
      After:
      
        % perf stat -a -e unc_c_llc_lookup.any sleep 1
      
        Performance counter stats for 'system wide':
      
               2,685,120 Bytes unc_c_llc_lookup.any
      
             1.002648032 seconds time elapsed
      
      v2: Split collect_aliases. Rename alias flag.
      v3: Make sure unsupported/not counted is always printed.
      v4: Factor out callback change into separate patch.
      v5: Move check for bad results here
          Move merged check into collect_data
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      430daf2d
  8. 21 3月, 2017 1 次提交
  9. 18 2月, 2017 1 次提交
    • J
      perf stat: Add -a as default target · 0d79f8b9
      Jiri Olsa 提交于
      Boris asked for default -a option in case we monitor only uncore events.
      
      While implementing that I thought it might be actually useful to make it
      overall default.
      
      Running 'perf stat' will now collect system wide data.
      
      Committer note:
      
      Testing it:
      
        # perf stat
        ^C
         Performance counter stats for 'system wide':
      
               3571.559178      cpu-clock (msec)          #    4.000 CPUs utilized
                     3,346      context-switches          #    0.937 K/sec
                       277      cpu-migrations            #    0.078 K/sec
                    57,271      page-faults               #    0.016 M/sec
             4,535,633,835      cycles                    #    1.270 GHz
             6,389,736,516      instructions              #    1.41  insn per cycle
             1,541,293,875      branches                  #  431.547 M/sec
                14,526,396      branch-misses             #    0.94% of all branches
      
               0.892950118 seconds time elapsed
      
        #
      Requested-and-Acked-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170217170034.GB15389@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d79f8b9
  10. 07 6月, 2016 1 次提交
    • A
      perf stat: Basic support for TopDown in perf stat · 44b1e60a
      Andi Kleen 提交于
      Add basic plumbing for TopDown in perf stat
      
      TopDown is intended to replace the frontend cycles idle/ backend cycles
      idle metrics in standard perf stat output.  These metrics are not
      reliable in many workloads, due to out of order effects.
      
      This implements a new --topdown mode in perf stat (similar to
      --transaction) that measures the pipe line bottlenecks using
      standardized formulas. The measurement can be all done with 5 counters
      (one fixed counter)
      
      The result are four metrics:
      
      FrontendBound, BackendBound, BadSpeculation, Retiring
      
      that describe the CPU pipeline behavior on a high level.
      
      The full top down methology has many hierarchical metrics.  This
      implementation only supports level 1 which can be collected without
      multiplexing. A full implementation of top down on top of perf is
      available in pmu-tools toplev.  (http://github.com/andikleen/pmu-tools)
      
      The current version works on Intel Core CPUs starting with Sandy Bridge,
      and Atom CPUs starting with Silvermont.  In principle the generic
      metrics should be also implementable on other out of order CPUs.
      
      TopDown level 1 uses a set of abstracted metrics which are generic to
      out of order CPU cores (although some CPUs may not implement all of
      them):
      
        topdown-total-slots       Available slots in the pipeline
        topdown-slots-issued      Slots issued into the pipeline
        topdown-slots-retired     Slots successfully retired
        topdown-fetch-bubbles     Pipeline gaps in the frontend
        topdown-recovery-bubbles  Pipeline gaps during recovery
                                  from misspeculation
      
      These metrics then allow to compute four useful metrics:
      
      FrontendBound, BackendBound, Retiring, BadSpeculation.
      
      Add a new --topdown options to enable events.  When --topdown is
      specified set up events for all topdown events supported by the kernel.
      Add topdown-* as a special case to the event parser, as is needed for
      all events containing -.
      
      The actual code to compute the metrics is in follow-on patches.
      
      v2: Use standard sysctl read function.
      v3: Move x86 specific code to arch/
      v4: Enable --metric-only implicitly for topdown.
      v5: Add --single-thread option to not force per core mode
      v6: Fix output order of topdown metrics
      v7: Allow combining with -d
      v8: Remove --single-thread again
      v9: Rename functions, adding arch_ and topdown_.
      v10: Expand man page and describe TopDown better
      Paste intro into commit description.
      Print error when malloc fails.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44b1e60a
  11. 11 3月, 2016 3 次提交
  12. 08 3月, 2016 1 次提交
  13. 18 12月, 2015 3 次提交
    • J
      perf stat report: Allow to override aggr_mode · 89af4e05
      Jiri Olsa 提交于
      Allowing to override record aggr_mode. It's possible to use perf stat
      like:
      
         $ perf stat report -A
         $ perf stat report --per-core
         $ perf stat report --per-socket
      
      To customize the recorded aggregate mode regardless what was used during
      the stat record command.
      Reported-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1446734469-11352-19-git-send-email-jolsa@kernel.org
      [ Renamed 'stat' parameter to 'st' to fix 'already defined' build error with older distros (e.g. RHEL6.7) ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      89af4e05
    • J
      perf stat report: Add report command · ba6039b6
      Jiri Olsa 提交于
      Adding 'perf stat report' command support. ATM it only processes attr
      events and display nothing.
      Reported-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1446734469-11352-12-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba6039b6
    • J
      perf stat record: Add record command · 4979d0c7
      Jiri Olsa 提交于
      Add 'perf stat record' command support. It creates simple (header only)
      perf.data file ATM.
      
      The record command could be specified anywhere among stat options. All
      stat command options are valid for stat record command with '-o' option
      exception. If specified for record command it denotes the perf data file
      name.
      
      Committer note:
      
      Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless
      while avoiding that older tools show confusing messages, for instance,
      with sample_type = 0, we get:
      
        $ perf stat record usleep 1
      
         Performance counter stats for 'usleep 1':
      
                0.630237      task-clock (msec)         #    0.528 CPUs utilized
                       1      context-switches          #    0.002 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      52      page-faults               #    0.083 M/sec
                 978,312      cycles                    #    1.552 GHz
                 671,931      stalled-cycles-frontend   #   68.68% frontend cycles idle
         <not supported>      stalled-cycles-backend
                 646,379      instructions              #    0.66  insns per cycle
                                                        #    1.04  stalled cycles per insn
                 131,046      branches                  #  207.931 M/sec
                   7,073      branch-misses             #    5.40% of all branches
      
             0.001193240 seconds time elapsed
      
        $ oldperf evlist
        WARNING: The perf.data file's data size field is 0 which is unexpected.
        Was the 'perf record' command properly terminated?
        non matching sample_type
        $
      
      While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf
      stat record usleep' we get:
      
        $ oldperf evlist
        WARNING: The perf.data file's data size field is 0 which is unexpected.
        Was the 'perf record' command properly terminated?
        task-clock
        context-switches
        cpu-migrations
        page-faults
        cycles
        stalled-cycles-frontend
        stalled-cycles-backend
        instructions
        branches
        branch-misses
        $
      
      Which at least shows the names of the events in the perf.data file.
      
      Additionally, such files, when passed to 'perf report' will produce:
      
        $ oldperf report --stdio
        WARNING: The perf.data file's data size field is 0 which is unexpected.
        Was the 'perf record' command properly terminated?
        Warning:
        Kernel address maps (/proc/{kallsyms,modules}) were restricted.
      
        Check /proc/sys/kernel/kptr_restrict before running 'perf record'.
      
        As no suitable kallsyms nor vmlinux was found, kernel samples
        can't be resolved.
      
        Samples in kernel modules can't be resolved as well.
      
        Error:
        The perf.data file has no samples!
        # To display the perf.data header info, please use --header/--header-only options.
        #
        $
      
      Which is confusing and can be solved by just adding the kernel mmap record,
      which will also remove that warning about the data size field being equal to
      zero, after generating the mmap record:
      
        $ perf stat record usleep 1
      
         Performance counter stats for 'usleep 1':
      
                0.600796      task-clock (msec)         #    0.478 CPUs utilized
                       1      context-switches          #    0.002 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      54      page-faults               #    0.090 M/sec
                 886,844      cycles                    #    1.476 GHz
                 582,169      stalled-cycles-frontend   #   65.65% frontend cycles idle
         <not supported>      stalled-cycles-backend
                 638,344      instructions              #    0.72  insns per cycle
                                                        #    0.91  stalled cycles per insn
                 130,204      branches                  #  216.719 M/sec
                   7,500      branch-misses             #    5.76% of all branches
      
             0.001255897 seconds time elapsed
      
        $ oldperf evlist
        task-clock
        context-switches
        cpu-migrations
        page-faults
        cycles
        stalled-cycles-frontend
        stalled-cycles-backend
        instructions
        branches
        branch-misses
        $ oldperf report --stdio
        Error:
        The perf.data file has no samples!
        # To display the perf.data header info, please use --header/--header-only options.
        #
        [acme@zoo linux]$
      
      No warnings, sensible output about what are the events in the perf.data file and also
      a "file has no samples" message, which indeed it doesn't.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NKan Liang <kan.liang@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4979d0c7
  14. 03 10月, 2015 1 次提交
    • K
      perf stat: Reduce min --interval-print to 10ms · 19afd104
      Kan Liang 提交于
      The --interval-print parameter was limited to 100ms. However, for
      example, 10ms is required to do sophisticated bandwidth analysis using
      uncore events.
      
      The test shows that the overhead of the system-wide uncore monitoring
      with 10ms interval is only ~2%. So this patch reduces the minimal
      interval-print allowd to 10ms.
      
      But 10ms may not work well for all cases. For example, when the
      cpus/threads number is very large, for system-wide core event monitoring
      the overhead could be high.
      
      To handle this issue, a warning will be displayed when the
      interval-print is set between 10ms to 100ms. So users can make a
      decision according to their specific cases.
      
       # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1
      
       print interval < 100ms. The overhead percentage could be high in some
       cases. Please proceed with caution.
       #           time             counts unit events
            0.010200451               0.10 MiB  uncore_imc_1/cas_count_read/
            0.020475117               0.02 MiB  uncore_imc_1/cas_count_read/
            0.030692800               0.01 MiB  uncore_imc_1/cas_count_read/
            0.040948161               0.02 MiB  uncore_imc_1/cas_count_read/
            0.051159564               0.00 MiB  uncore_imc_1/cas_count_read/
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com
      [ Added warning about overhead when using sub 100ms intervals to the man page ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      19afd104
  15. 26 6月, 2015 1 次提交
    • J
      perf stat: Introduce --per-thread option · 32b8af82
      Jiri Olsa 提交于
      Currently all the -p option PID arguments tasks values get aggregated
      and printed as single values.
      
      Adding --per-tasks option to print values per task.
      
        $ perf stat  -e cycles,instructions --per-thread -p 30190,30242
        ^C
         Performance counter stats for process id '30190,30242':
      
                     cat-30190                     0      cycles
                     yes-30242         3,842,525,421      cycles
                     cat-30190                     0      instructions
                     yes-30242        10,370,817,010      instructions
      
               1.143155657 seconds time elapsed
      
      Also works under interval mode:
      
        $ perf stat  -e cycles,instructions --per-thread -p 30190,30242 -I 1000
        #           time             comm-pid                  counts unit events
             1.000073435              cat-30190                89,058      cycles
             1.000073435              yes-30242         3,360,786,902      cycles                     (100.00%)
             1.000073435              cat-30190                14,066      instructions
             1.000073435              yes-30242         9,069,937,462      instructions
             2.000204830              cat-30190                     0      cycles
             2.000204830              yes-30242         3,351,667,626      cycles
             2.000204830              cat-30190                     0      instructions
             2.000204830              yes-30242         9,045,796,885      instructions
        ^C     2.771286639              cat-30190                     0      cycles
             2.771286639              yes-30242         2,593,884,166      cycles
             2.771286639              cat-30190                     0      instructions
             2.771286639              yes-30242         7,001,171,191      instructions
      
      It works only with -t and -p options, otherwise following error is
      printed:
      
        $ perf stat  -e cycles --per-thread  -I 1000 ls
        The --per-thread option is only available when monitoring via -p -t options.
            -p, --pid <pid>       stat events on existing process id
            -t, --tid <tid>       stat events on existing thread id
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32b8af82
  16. 22 1月, 2015 1 次提交
  17. 13 1月, 2014 1 次提交
  18. 04 10月, 2013 1 次提交
  19. 08 8月, 2013 1 次提交
    • A
      perf stat: Add support for --initial-delay option · 41191688
      Andi Kleen 提交于
      When measuring workloads the startup phase -- doing page faults, dynamic
      linking, opening files -- is often very different from the rest of the
      workload.  Especially with smaller kernels and using counter
      multiplexing this can give significant measurement errors.
      
      Multiplexing assumes that the workload is mostly the same over longer
      periods. But at startup there is typically some spike of activity which
      is relatively short.  If many groups are multiplexing the one group
      seeing the spike, and which is then scaled up over the time to run all
      groups, may see a significant error.
      
      Also in general it's often not useful to measure the startup, because it
      is so different from the rest.
      
      One way around this is to use interval mode and discard the first
      sample, but this can be awkward because interval mode doesn't support
      intervals of less than 100ms, and also a useful interval is not
      necessarily the same as a useful startup delay.
      
      This patch adds a new --initial-delay / -D option to skip measuring for
      the startup phase. The time can be specified in ms
      
      Here's a simple example:
      
      perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
      ...
                   3,721 page-faults
      ...
      
      If we just wait 20 ms the number of page faults is 1/3 less:
      
      perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
      ...
                   2,823 page-faults
      ...
      
      So we filtered out most of the startup noise from bash.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      41191688
  20. 26 3月, 2013 2 次提交
  21. 16 3月, 2013 1 次提交
  22. 07 2月, 2013 1 次提交
    • S
      perf stat: Add per processor socket count aggregation · d7e7a451
      Stephane Eranian 提交于
      This patch adds per-processor socket count aggregation for system-wide
      mode measurements. This is a useful mode to detect imbalance between
      sockets.
      
      To enable this mode, use --aggr-socket in addition
      to -a. (system-wide).
      
      The output includes the socket number and the number of online
      processors on that socket. This is useful to gauge the amount of
      aggregation.
      
       # ./perf stat -I 1000 -a --aggr-socket -e cycles sleep 2
       #           time socket cpus             counts events
            1.000097680 S0        4          5,788,785 cycles
            2.000379943 S0        4         27,361,546 cycles
            2.001167808 S0        4            818,275 cycles
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1360161962-9675-3-git-send-email-eranian@google.com
      [ committer note: Added missing man page entry based on above comments ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7e7a451
  23. 30 1月, 2013 1 次提交
    • S
      perf stat: Add interval printing · 13370a9b
      Stephane Eranian 提交于
      This patch adds a new printing mode for perf stat.  It allows interval
      printing. That means perf stat can now print event deltas at regular
      time interval.  This is useful to detect phases in programs.
      
      The -I option enables interval printing. It expects an interval duration
      in milliseconds. Minimum is 100ms. Once, activated perf stat prints
      events deltas since last printout. All modes are supported.
      
      $ perf stat -I 1000 -e cycles noploop 10
      noploop for 10 seconds
       #           time             counts events
            1.000109853      2,388,560,546 cycles
            2.000262846      2,393,332,358 cycles
            3.000354131      2,393,176,537 cycles
            4.000439503      2,393,203,790 cycles
            5.000527075      2,393,167,675 cycles
            6.000609052      2,393,203,670 cycles
            7.000691082      2,393,175,678 cycles
      
      The output format makes it easy to feed into a plotting program such as
      gnuplot when the -I option is used in combination with the -x option:
      
      $ perf stat -x, -I 1000 -e cycles noploop 10
      noploop for 10 seconds
      1.000084113,2378775498,cycles
      2.000245798,2391056897,cycles
      3.000354445,2392089414,cycles
      4.000459115,2390936603,cycles
      5.000565341,2392108173,cycles
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      13370a9b
  24. 26 10月, 2012 1 次提交
  25. 14 2月, 2012 1 次提交
  26. 30 9月, 2011 1 次提交
  27. 18 8月, 2011 1 次提交
    • S
      perf stat: Add -o and --append options · 4aa9015f
      Stephane Eranian 提交于
      This patch adds an option (-o) to save the output of perf stat into a
      file. You could do this with perf record but not with perf stat.
      Instead, you had to fiddle with stderr to save the counts into a
      separate file.
      
      The patch also adds the --append option so that results can be
      concatenated into a single file across runs. Each run of the tool is
      clearly separated by a comment line starting with a hash mark. The -A
      option of perf record is already used by perf stat, so we only add a
      long option.
      
      $ perf stat -o res.txt date
      $ cat res.txt
      
       Performance counter stats for 'date':
      
                0.791306 task-clock                #    0.668 CPUs utilized
                       2 context-switches          #    0.003 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     197 page-faults               #    0.249 M/sec
                 1878143 cycles                    #    2.373 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 1083367 instructions              #    0.58  insns per cycle
                  193027 branches                  #  243.935 M/sec
                    9014 branch-misses             #    4.67% of all branches
      
             0.001184746 seconds time elapsed
      
      The option can be combined with -x to make the output file much easier
      to parse.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110815202233.GA18535@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4aa9015f
  28. 16 2月, 2011 1 次提交
    • S
      perf tool: Add cgroup support · 023695d9
      Stephane Eranian 提交于
      This patch adds the ability to filter monitoring based on container groups
      (cgroups) for both perf stat and perf record. It is possible to monitor
      multiple cgroup in parallel. There is one cgroup per event. The cgroups to
      monitor are passed via a new -G option followed by a comma separated list of
      cgroup names.
      
      The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
      finds the corresponding directory in the cgroup filesystem and opens it. It
      then passes that file descriptor to the kernel.
      
      Example:
      
      $ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
       Performance counter stats for 'sleep 1':
      
            2,368,667,414  cycles                   test1
            2,369,661,459  cycles
            <not counted>  cycles                   test2
      
              1.001856890  seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      023695d9
  29. 02 12月, 2010 2 次提交
    • S
      perf stat: Add csv-style output · d7470b6a
      Stephane Eranian 提交于
      This patch adds an option (-x/--field-separator) to print counts using a
      CSV-style output. The user can pass a custom separator. This makes it very easy
      to import counts directly into your favorite spreadsheet without having to
      write scripts.
      
      Example:
      $ perf stat --field-separator=,  -a -- sleep 1
      4009.961740,task-clock-msecs
      13,context-switches
      2,CPU-migrations
      189,page-faults
      9596385684,cycles
      3493659441,instructions
      872897069,branches
      41562,branch-misses
      22424,cache-references
      1289,cache-misses
      
      Works also in non-aggregated mode:
      
      $ perf stat -x ,  -a -A -- sleep 1
      CPU0,1002.526168,task-clock-msecs
      CPU1,1002.528365,task-clock-msecs
      CPU2,1002.523360,task-clock-msecs
      CPU3,1002.519878,task-clock-msecs
      CPU0,1,context-switches
      CPU1,5,context-switches
      CPU2,5,context-switches
      CPU3,6,context-switches
      CPU0,0,CPU-migrations
      CPU1,1,CPU-migrations
      CPU2,0,CPU-migrations
      CPU3,1,CPU-migrations
      CPU0,2,page-faults
      CPU1,6,page-faults
      CPU2,9,page-faults
      CPU3,174,page-faults
      CPU0,2399439771,cycles
      CPU1,2380369063,cycles
      CPU2,2399142710,cycles
      CPU3,2373161192,cycles
      CPU0,872900618,instructions
      CPU1,873030960,instructions
      CPU2,872714525,instructions
      CPU3,874460580,instructions
      CPU0,221556839,branches
      CPU1,218134342,branches
      CPU2,218161730,branches
      CPU3,218284093,branches
      CPU0,18556,branch-misses
      CPU1,1449,branch-misses
      CPU2,3447,branch-misses
      CPU3,12714,branch-misses
      CPU0,8330,cache-references
      CPU1,313844,cache-references
      CPU2,47993728,cache-references
      CPU3,826481,cache-references
      CPU0,272,cache-misses
      CPU1,5360,cache-misses
      CPU2,1342193,cache-misses
      CPU3,13992,cache-misses
      
      This second version adds the ability to name a separator and uses
      field-separator as the long option to be consistent with perf report.
      
      Commiter note: Since we enabled --big-num by default in 201e0b06 and -x can't be
      used with it, we need to notice if the user explicitely enabled or disabled -B,
      add code to disable big_num if the user didn't explicitely set --big_num when
      -x is used.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederik Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7470b6a
    • S
      perf stat: Document missing options · 8c207692
      Shawn Bohrer 提交于
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1291168642-11402-12-git-send-email-shawn.bohrer@gmail.com>
      Signed-off-by: NShawn Bohrer <shawn.bohrer@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8c207692
  30. 20 11月, 2010 1 次提交
    • S
      perf stat: Add no-aggregation mode to -a · f5b4a9c3
      Stephane Eranian 提交于
      This patch adds a new -A option to perf stat. If specified then perf stat does
      not aggregate counts across all monitored CPUs in system-wide mode, i.e., when
      using -a. This option is not supported in per-thread mode.
      
      Being able to get a per-cpu breakdown is useful to detect imbalances between
      CPUs when running a uniform workload than spans all monitored CPUs.
      
      The second version corrects the missing cpumap[] support, so that it works when
      the -C option is used.
      
      The third version fixes a missing cpumap[] in print_counter() and removes a
      stray patch in builtin-trace.c.
      
      Examples on a 4-way system:
      
      # perf stat -a   -e cycles,instructions -- sleep 1
       Performance counter stats for 'sleep 1':
               9592808135  cycles
               3490380006  instructions             #      0.364 IPC
              1.001584632  seconds time elapsed
      
      # perf stat -a -A -e cycles,instructions -- sleep 1
       Performance counter stats for 'sleep 1':
      CPU0            2398163767  cycles
      CPU1            2398180817  cycles
      CPU2            2398217115  cycles
      CPU3            2398247483  cycles
      CPU0             872282046  instructions             #      0.364 IPC
      CPU1             873481776  instructions             #      0.364 IPC
      CPU2             872638127  instructions             #      0.364 IPC
      CPU3             872437789  instructions             #      0.364 IPC
              1.001556052  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f5b4a9c3
  31. 05 6月, 2010 1 次提交
    • S
      perf tools: Add the ability to specify list of cpus to monitor · c45c6ea2
      Stephane Eranian 提交于
      This patch adds a -C option to stat, record, top to designate a list of CPUs to
      monitor. CPUs can be specified as a comma-separated list or ranges, no space
      allowed.
      
      Examples:
      $ perf record -a -C0-1,4-7 sleep 1
      $ perf top -C0-4
      $ perf stat -a -C1,2,3,4 sleep 1
      
      With perf record in per-thread mode with inherit mode on, samples are collected
      only when the thread runs on the designated CPUs.
      
      The -C option does not turn on system-wide mode automatically.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bff9496.d345d80a.41fe.7b00@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c45c6ea2
  32. 19 5月, 2010 1 次提交
    • S
      perf stat: add perf stat -B to pretty print large numbers · 5af52b51
      Stephane Eranian 提交于
      It is hard to read very large numbers so provide an option to perf stat
      to separate thousands using a separator. The patch leverages the locale
      support of stdio. You need to set your LC_NUMERIC appropriately, for
      instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
      feature. This way existing scripts parsing the output do not need to be
      changed. Here is an example.
      
      $ perf stat noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.347031  task-clock-msecs         #      0.998 CPUs
                       61  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      118  page-faults              #      0.000 M/sec
            4,138,410,900  cycles                   #   2070.917 M/sec  (scaled from 70.01%)
            2,062,650,268  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,057,653,466  branches                 #   1029.678 M/sec  (scaled from 70.01%)
                   40,267  branch-misses            #      0.002 %      (scaled from 30.04%)
            2,055,961,348  cache-references         #   1028.831 M/sec  (scaled from 30.03%)
                   53,725  cache-misses             #      0.027 M/sec  (scaled from 30.02%)
      
              2.001393933  seconds time elapsed
      
      $ perf stat -B  noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.297883  task-clock-msecs         #      0.998 CPUs
                       59  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      119  page-faults              #      0.000 M/sec
            4,131,380,160  cycles                   #   2067.450 M/sec  (scaled from 70.01%)
            2,059,096,507  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,054,681,303  branches                 #   1028.216 M/sec  (scaled from 70.01%)
                   25,650  branch-misses            #      0.001 %      (scaled from 30.05%)
            2,056,283,014  cache-references         #   1029.017 M/sec  (scaled from 30.03%)
                   47,097  cache-misses             #      0.024 M/sec  (scaled from 30.02%)
      
              2.001391016  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af52b51
  33. 14 5月, 2010 1 次提交
    • S
      perf tools: change event inheritance logic in stat and record · 2e6cdf99
      Stephane Eranian 提交于
      By default, event inheritance across fork and pthread_create was on but the -i
      option of stat and record, which enabled inheritance, led to believe it was off
      by default.
      
      This patch fixes this logic by inverting the meaning of the -i option.  By
      default inheritance is on whether you attach to a process (-p), a thread (-t)
      or start a process. If you pass -i, then you turn off inheritance. Turning off
      inheritance if you don't need it, helps limit perf resource usage as well.
      
      The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not
      start the counters.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4bea9d2f.d60ce30a.0b5b.08e1@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e6cdf99