1. 22 3月, 2017 1 次提交
    • A
      perf stat: Collapse identically named events · 430daf2d
      Andi Kleen 提交于
      The uncore PMU has a lot of duplicated PMUs for different subsystems.
      When expanding an uncore alias we usually end up with a large
      number of identically named aliases, which makes perf stat
      output difficult to read.
      
      Automatically sum them up in perf stat, unless --no-merge is specified.
      
      This can be default because only the uncores generally have duplicated
      aliases. Other PMUs have unique names.
      
      Before:
      
        % perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1
      
        Performance counter stats for 'system wide':
      
                 694,976 Bytes unc_c_llc_lookup.any
                 706,304 Bytes unc_c_llc_lookup.any
                 956,608 Bytes unc_c_llc_lookup.any
                 782,720 Bytes unc_c_llc_lookup.any
                 605,696 Bytes unc_c_llc_lookup.any
                 442,816 Bytes unc_c_llc_lookup.any
                 659,328 Bytes unc_c_llc_lookup.any
                 509,312 Bytes unc_c_llc_lookup.any
                 263,936 Bytes unc_c_llc_lookup.any
                 592,448 Bytes unc_c_llc_lookup.any
                 672,448 Bytes unc_c_llc_lookup.any
                 608,640 Bytes unc_c_llc_lookup.any
                 641,024 Bytes unc_c_llc_lookup.any
                 856,896 Bytes unc_c_llc_lookup.any
                 808,832 Bytes unc_c_llc_lookup.any
                 684,864 Bytes unc_c_llc_lookup.any
                 710,464 Bytes unc_c_llc_lookup.any
                 538,304 Bytes unc_c_llc_lookup.any
      
             1.002577660 seconds time elapsed
      
      After:
      
        % perf stat -a -e unc_c_llc_lookup.any sleep 1
      
        Performance counter stats for 'system wide':
      
               2,685,120 Bytes unc_c_llc_lookup.any
      
             1.002648032 seconds time elapsed
      
      v2: Split collect_aliases. Rename alias flag.
      v3: Make sure unsupported/not counted is always printed.
      v4: Factor out callback change into separate patch.
      v5: Move check for bad results here
          Move merged check into collect_data
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      430daf2d
  2. 21 3月, 2017 1 次提交
  3. 18 2月, 2017 1 次提交
    • J
      perf stat: Add -a as default target · 0d79f8b9
      Jiri Olsa 提交于
      Boris asked for default -a option in case we monitor only uncore events.
      
      While implementing that I thought it might be actually useful to make it
      overall default.
      
      Running 'perf stat' will now collect system wide data.
      
      Committer note:
      
      Testing it:
      
        # perf stat
        ^C
         Performance counter stats for 'system wide':
      
               3571.559178      cpu-clock (msec)          #    4.000 CPUs utilized
                     3,346      context-switches          #    0.937 K/sec
                       277      cpu-migrations            #    0.078 K/sec
                    57,271      page-faults               #    0.016 M/sec
             4,535,633,835      cycles                    #    1.270 GHz
             6,389,736,516      instructions              #    1.41  insn per cycle
             1,541,293,875      branches                  #  431.547 M/sec
                14,526,396      branch-misses             #    0.94% of all branches
      
               0.892950118 seconds time elapsed
      
        #
      Requested-and-Acked-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170217170034.GB15389@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d79f8b9
  4. 07 6月, 2016 1 次提交
    • A
      perf stat: Basic support for TopDown in perf stat · 44b1e60a
      Andi Kleen 提交于
      Add basic plumbing for TopDown in perf stat
      
      TopDown is intended to replace the frontend cycles idle/ backend cycles
      idle metrics in standard perf stat output.  These metrics are not
      reliable in many workloads, due to out of order effects.
      
      This implements a new --topdown mode in perf stat (similar to
      --transaction) that measures the pipe line bottlenecks using
      standardized formulas. The measurement can be all done with 5 counters
      (one fixed counter)
      
      The result are four metrics:
      
      FrontendBound, BackendBound, BadSpeculation, Retiring
      
      that describe the CPU pipeline behavior on a high level.
      
      The full top down methology has many hierarchical metrics.  This
      implementation only supports level 1 which can be collected without
      multiplexing. A full implementation of top down on top of perf is
      available in pmu-tools toplev.  (http://github.com/andikleen/pmu-tools)
      
      The current version works on Intel Core CPUs starting with Sandy Bridge,
      and Atom CPUs starting with Silvermont.  In principle the generic
      metrics should be also implementable on other out of order CPUs.
      
      TopDown level 1 uses a set of abstracted metrics which are generic to
      out of order CPU cores (although some CPUs may not implement all of
      them):
      
        topdown-total-slots       Available slots in the pipeline
        topdown-slots-issued      Slots issued into the pipeline
        topdown-slots-retired     Slots successfully retired
        topdown-fetch-bubbles     Pipeline gaps in the frontend
        topdown-recovery-bubbles  Pipeline gaps during recovery
                                  from misspeculation
      
      These metrics then allow to compute four useful metrics:
      
      FrontendBound, BackendBound, Retiring, BadSpeculation.
      
      Add a new --topdown options to enable events.  When --topdown is
      specified set up events for all topdown events supported by the kernel.
      Add topdown-* as a special case to the event parser, as is needed for
      all events containing -.
      
      The actual code to compute the metrics is in follow-on patches.
      
      v2: Use standard sysctl read function.
      v3: Move x86 specific code to arch/
      v4: Enable --metric-only implicitly for topdown.
      v5: Add --single-thread option to not force per core mode
      v6: Fix output order of topdown metrics
      v7: Allow combining with -d
      v8: Remove --single-thread again
      v9: Rename functions, adding arch_ and topdown_.
      v10: Expand man page and describe TopDown better
      Paste intro into commit description.
      Print error when malloc fails.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44b1e60a
  5. 11 3月, 2016 3 次提交
  6. 08 3月, 2016 1 次提交
  7. 18 12月, 2015 3 次提交
    • J
      perf stat report: Allow to override aggr_mode · 89af4e05
      Jiri Olsa 提交于
      Allowing to override record aggr_mode. It's possible to use perf stat
      like:
      
         $ perf stat report -A
         $ perf stat report --per-core
         $ perf stat report --per-socket
      
      To customize the recorded aggregate mode regardless what was used during
      the stat record command.
      Reported-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1446734469-11352-19-git-send-email-jolsa@kernel.org
      [ Renamed 'stat' parameter to 'st' to fix 'already defined' build error with older distros (e.g. RHEL6.7) ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      89af4e05
    • J
      perf stat report: Add report command · ba6039b6
      Jiri Olsa 提交于
      Adding 'perf stat report' command support. ATM it only processes attr
      events and display nothing.
      Reported-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1446734469-11352-12-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba6039b6
    • J
      perf stat record: Add record command · 4979d0c7
      Jiri Olsa 提交于
      Add 'perf stat record' command support. It creates simple (header only)
      perf.data file ATM.
      
      The record command could be specified anywhere among stat options. All
      stat command options are valid for stat record command with '-o' option
      exception. If specified for record command it denotes the perf data file
      name.
      
      Committer note:
      
      Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless
      while avoiding that older tools show confusing messages, for instance,
      with sample_type = 0, we get:
      
        $ perf stat record usleep 1
      
         Performance counter stats for 'usleep 1':
      
                0.630237      task-clock (msec)         #    0.528 CPUs utilized
                       1      context-switches          #    0.002 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      52      page-faults               #    0.083 M/sec
                 978,312      cycles                    #    1.552 GHz
                 671,931      stalled-cycles-frontend   #   68.68% frontend cycles idle
         <not supported>      stalled-cycles-backend
                 646,379      instructions              #    0.66  insns per cycle
                                                        #    1.04  stalled cycles per insn
                 131,046      branches                  #  207.931 M/sec
                   7,073      branch-misses             #    5.40% of all branches
      
             0.001193240 seconds time elapsed
      
        $ oldperf evlist
        WARNING: The perf.data file's data size field is 0 which is unexpected.
        Was the 'perf record' command properly terminated?
        non matching sample_type
        $
      
      While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf
      stat record usleep' we get:
      
        $ oldperf evlist
        WARNING: The perf.data file's data size field is 0 which is unexpected.
        Was the 'perf record' command properly terminated?
        task-clock
        context-switches
        cpu-migrations
        page-faults
        cycles
        stalled-cycles-frontend
        stalled-cycles-backend
        instructions
        branches
        branch-misses
        $
      
      Which at least shows the names of the events in the perf.data file.
      
      Additionally, such files, when passed to 'perf report' will produce:
      
        $ oldperf report --stdio
        WARNING: The perf.data file's data size field is 0 which is unexpected.
        Was the 'perf record' command properly terminated?
        Warning:
        Kernel address maps (/proc/{kallsyms,modules}) were restricted.
      
        Check /proc/sys/kernel/kptr_restrict before running 'perf record'.
      
        As no suitable kallsyms nor vmlinux was found, kernel samples
        can't be resolved.
      
        Samples in kernel modules can't be resolved as well.
      
        Error:
        The perf.data file has no samples!
        # To display the perf.data header info, please use --header/--header-only options.
        #
        $
      
      Which is confusing and can be solved by just adding the kernel mmap record,
      which will also remove that warning about the data size field being equal to
      zero, after generating the mmap record:
      
        $ perf stat record usleep 1
      
         Performance counter stats for 'usleep 1':
      
                0.600796      task-clock (msec)         #    0.478 CPUs utilized
                       1      context-switches          #    0.002 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      54      page-faults               #    0.090 M/sec
                 886,844      cycles                    #    1.476 GHz
                 582,169      stalled-cycles-frontend   #   65.65% frontend cycles idle
         <not supported>      stalled-cycles-backend
                 638,344      instructions              #    0.72  insns per cycle
                                                        #    0.91  stalled cycles per insn
                 130,204      branches                  #  216.719 M/sec
                   7,500      branch-misses             #    5.76% of all branches
      
             0.001255897 seconds time elapsed
      
        $ oldperf evlist
        task-clock
        context-switches
        cpu-migrations
        page-faults
        cycles
        stalled-cycles-frontend
        stalled-cycles-backend
        instructions
        branches
        branch-misses
        $ oldperf report --stdio
        Error:
        The perf.data file has no samples!
        # To display the perf.data header info, please use --header/--header-only options.
        #
        [acme@zoo linux]$
      
      No warnings, sensible output about what are the events in the perf.data file and also
      a "file has no samples" message, which indeed it doesn't.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NKan Liang <kan.liang@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4979d0c7
  8. 03 10月, 2015 1 次提交
    • K
      perf stat: Reduce min --interval-print to 10ms · 19afd104
      Kan Liang 提交于
      The --interval-print parameter was limited to 100ms. However, for
      example, 10ms is required to do sophisticated bandwidth analysis using
      uncore events.
      
      The test shows that the overhead of the system-wide uncore monitoring
      with 10ms interval is only ~2%. So this patch reduces the minimal
      interval-print allowd to 10ms.
      
      But 10ms may not work well for all cases. For example, when the
      cpus/threads number is very large, for system-wide core event monitoring
      the overhead could be high.
      
      To handle this issue, a warning will be displayed when the
      interval-print is set between 10ms to 100ms. So users can make a
      decision according to their specific cases.
      
       # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1
      
       print interval < 100ms. The overhead percentage could be high in some
       cases. Please proceed with caution.
       #           time             counts unit events
            0.010200451               0.10 MiB  uncore_imc_1/cas_count_read/
            0.020475117               0.02 MiB  uncore_imc_1/cas_count_read/
            0.030692800               0.01 MiB  uncore_imc_1/cas_count_read/
            0.040948161               0.02 MiB  uncore_imc_1/cas_count_read/
            0.051159564               0.00 MiB  uncore_imc_1/cas_count_read/
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com
      [ Added warning about overhead when using sub 100ms intervals to the man page ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      19afd104
  9. 26 6月, 2015 1 次提交
    • J
      perf stat: Introduce --per-thread option · 32b8af82
      Jiri Olsa 提交于
      Currently all the -p option PID arguments tasks values get aggregated
      and printed as single values.
      
      Adding --per-tasks option to print values per task.
      
        $ perf stat  -e cycles,instructions --per-thread -p 30190,30242
        ^C
         Performance counter stats for process id '30190,30242':
      
                     cat-30190                     0      cycles
                     yes-30242         3,842,525,421      cycles
                     cat-30190                     0      instructions
                     yes-30242        10,370,817,010      instructions
      
               1.143155657 seconds time elapsed
      
      Also works under interval mode:
      
        $ perf stat  -e cycles,instructions --per-thread -p 30190,30242 -I 1000
        #           time             comm-pid                  counts unit events
             1.000073435              cat-30190                89,058      cycles
             1.000073435              yes-30242         3,360,786,902      cycles                     (100.00%)
             1.000073435              cat-30190                14,066      instructions
             1.000073435              yes-30242         9,069,937,462      instructions
             2.000204830              cat-30190                     0      cycles
             2.000204830              yes-30242         3,351,667,626      cycles
             2.000204830              cat-30190                     0      instructions
             2.000204830              yes-30242         9,045,796,885      instructions
        ^C     2.771286639              cat-30190                     0      cycles
             2.771286639              yes-30242         2,593,884,166      cycles
             2.771286639              cat-30190                     0      instructions
             2.771286639              yes-30242         7,001,171,191      instructions
      
      It works only with -t and -p options, otherwise following error is
      printed:
      
        $ perf stat  -e cycles --per-thread  -I 1000 ls
        The --per-thread option is only available when monitoring via -p -t options.
            -p, --pid <pid>       stat events on existing process id
            -t, --tid <tid>       stat events on existing thread id
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32b8af82
  10. 22 1月, 2015 1 次提交
  11. 13 1月, 2014 1 次提交
  12. 04 10月, 2013 1 次提交
  13. 08 8月, 2013 1 次提交
    • A
      perf stat: Add support for --initial-delay option · 41191688
      Andi Kleen 提交于
      When measuring workloads the startup phase -- doing page faults, dynamic
      linking, opening files -- is often very different from the rest of the
      workload.  Especially with smaller kernels and using counter
      multiplexing this can give significant measurement errors.
      
      Multiplexing assumes that the workload is mostly the same over longer
      periods. But at startup there is typically some spike of activity which
      is relatively short.  If many groups are multiplexing the one group
      seeing the spike, and which is then scaled up over the time to run all
      groups, may see a significant error.
      
      Also in general it's often not useful to measure the startup, because it
      is so different from the rest.
      
      One way around this is to use interval mode and discard the first
      sample, but this can be awkward because interval mode doesn't support
      intervals of less than 100ms, and also a useful interval is not
      necessarily the same as a useful startup delay.
      
      This patch adds a new --initial-delay / -D option to skip measuring for
      the startup phase. The time can be specified in ms
      
      Here's a simple example:
      
      perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
      ...
                   3,721 page-faults
      ...
      
      If we just wait 20 ms the number of page faults is 1/3 less:
      
      perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
      ...
                   2,823 page-faults
      ...
      
      So we filtered out most of the startup noise from bash.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      41191688
  14. 26 3月, 2013 2 次提交
  15. 16 3月, 2013 1 次提交
  16. 07 2月, 2013 1 次提交
    • S
      perf stat: Add per processor socket count aggregation · d7e7a451
      Stephane Eranian 提交于
      This patch adds per-processor socket count aggregation for system-wide
      mode measurements. This is a useful mode to detect imbalance between
      sockets.
      
      To enable this mode, use --aggr-socket in addition
      to -a. (system-wide).
      
      The output includes the socket number and the number of online
      processors on that socket. This is useful to gauge the amount of
      aggregation.
      
       # ./perf stat -I 1000 -a --aggr-socket -e cycles sleep 2
       #           time socket cpus             counts events
            1.000097680 S0        4          5,788,785 cycles
            2.000379943 S0        4         27,361,546 cycles
            2.001167808 S0        4            818,275 cycles
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1360161962-9675-3-git-send-email-eranian@google.com
      [ committer note: Added missing man page entry based on above comments ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7e7a451
  17. 30 1月, 2013 1 次提交
    • S
      perf stat: Add interval printing · 13370a9b
      Stephane Eranian 提交于
      This patch adds a new printing mode for perf stat.  It allows interval
      printing. That means perf stat can now print event deltas at regular
      time interval.  This is useful to detect phases in programs.
      
      The -I option enables interval printing. It expects an interval duration
      in milliseconds. Minimum is 100ms. Once, activated perf stat prints
      events deltas since last printout. All modes are supported.
      
      $ perf stat -I 1000 -e cycles noploop 10
      noploop for 10 seconds
       #           time             counts events
            1.000109853      2,388,560,546 cycles
            2.000262846      2,393,332,358 cycles
            3.000354131      2,393,176,537 cycles
            4.000439503      2,393,203,790 cycles
            5.000527075      2,393,167,675 cycles
            6.000609052      2,393,203,670 cycles
            7.000691082      2,393,175,678 cycles
      
      The output format makes it easy to feed into a plotting program such as
      gnuplot when the -I option is used in combination with the -x option:
      
      $ perf stat -x, -I 1000 -e cycles noploop 10
      noploop for 10 seconds
      1.000084113,2378775498,cycles
      2.000245798,2391056897,cycles
      3.000354445,2392089414,cycles
      4.000459115,2390936603,cycles
      5.000565341,2392108173,cycles
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1359460064-3060-3-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      13370a9b
  18. 26 10月, 2012 1 次提交
  19. 14 2月, 2012 1 次提交
  20. 30 9月, 2011 1 次提交
  21. 18 8月, 2011 1 次提交
    • S
      perf stat: Add -o and --append options · 4aa9015f
      Stephane Eranian 提交于
      This patch adds an option (-o) to save the output of perf stat into a
      file. You could do this with perf record but not with perf stat.
      Instead, you had to fiddle with stderr to save the counts into a
      separate file.
      
      The patch also adds the --append option so that results can be
      concatenated into a single file across runs. Each run of the tool is
      clearly separated by a comment line starting with a hash mark. The -A
      option of perf record is already used by perf stat, so we only add a
      long option.
      
      $ perf stat -o res.txt date
      $ cat res.txt
      
       Performance counter stats for 'date':
      
                0.791306 task-clock                #    0.668 CPUs utilized
                       2 context-switches          #    0.003 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     197 page-faults               #    0.249 M/sec
                 1878143 cycles                    #    2.373 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 1083367 instructions              #    0.58  insns per cycle
                  193027 branches                  #  243.935 M/sec
                    9014 branch-misses             #    4.67% of all branches
      
             0.001184746 seconds time elapsed
      
      The option can be combined with -x to make the output file much easier
      to parse.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110815202233.GA18535@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4aa9015f
  22. 16 2月, 2011 1 次提交
    • S
      perf tool: Add cgroup support · 023695d9
      Stephane Eranian 提交于
      This patch adds the ability to filter monitoring based on container groups
      (cgroups) for both perf stat and perf record. It is possible to monitor
      multiple cgroup in parallel. There is one cgroup per event. The cgroups to
      monitor are passed via a new -G option followed by a comma separated list of
      cgroup names.
      
      The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
      finds the corresponding directory in the cgroup filesystem and opens it. It
      then passes that file descriptor to the kernel.
      
      Example:
      
      $ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
       Performance counter stats for 'sleep 1':
      
            2,368,667,414  cycles                   test1
            2,369,661,459  cycles
            <not counted>  cycles                   test2
      
              1.001856890  seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      023695d9
  23. 02 12月, 2010 2 次提交
    • S
      perf stat: Add csv-style output · d7470b6a
      Stephane Eranian 提交于
      This patch adds an option (-x/--field-separator) to print counts using a
      CSV-style output. The user can pass a custom separator. This makes it very easy
      to import counts directly into your favorite spreadsheet without having to
      write scripts.
      
      Example:
      $ perf stat --field-separator=,  -a -- sleep 1
      4009.961740,task-clock-msecs
      13,context-switches
      2,CPU-migrations
      189,page-faults
      9596385684,cycles
      3493659441,instructions
      872897069,branches
      41562,branch-misses
      22424,cache-references
      1289,cache-misses
      
      Works also in non-aggregated mode:
      
      $ perf stat -x ,  -a -A -- sleep 1
      CPU0,1002.526168,task-clock-msecs
      CPU1,1002.528365,task-clock-msecs
      CPU2,1002.523360,task-clock-msecs
      CPU3,1002.519878,task-clock-msecs
      CPU0,1,context-switches
      CPU1,5,context-switches
      CPU2,5,context-switches
      CPU3,6,context-switches
      CPU0,0,CPU-migrations
      CPU1,1,CPU-migrations
      CPU2,0,CPU-migrations
      CPU3,1,CPU-migrations
      CPU0,2,page-faults
      CPU1,6,page-faults
      CPU2,9,page-faults
      CPU3,174,page-faults
      CPU0,2399439771,cycles
      CPU1,2380369063,cycles
      CPU2,2399142710,cycles
      CPU3,2373161192,cycles
      CPU0,872900618,instructions
      CPU1,873030960,instructions
      CPU2,872714525,instructions
      CPU3,874460580,instructions
      CPU0,221556839,branches
      CPU1,218134342,branches
      CPU2,218161730,branches
      CPU3,218284093,branches
      CPU0,18556,branch-misses
      CPU1,1449,branch-misses
      CPU2,3447,branch-misses
      CPU3,12714,branch-misses
      CPU0,8330,cache-references
      CPU1,313844,cache-references
      CPU2,47993728,cache-references
      CPU3,826481,cache-references
      CPU0,272,cache-misses
      CPU1,5360,cache-misses
      CPU2,1342193,cache-misses
      CPU3,13992,cache-misses
      
      This second version adds the ability to name a separator and uses
      field-separator as the long option to be consistent with perf report.
      
      Commiter note: Since we enabled --big-num by default in 201e0b06 and -x can't be
      used with it, we need to notice if the user explicitely enabled or disabled -B,
      add code to disable big_num if the user didn't explicitely set --big_num when
      -x is used.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederik Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7470b6a
    • S
      perf stat: Document missing options · 8c207692
      Shawn Bohrer 提交于
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1291168642-11402-12-git-send-email-shawn.bohrer@gmail.com>
      Signed-off-by: NShawn Bohrer <shawn.bohrer@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8c207692
  24. 20 11月, 2010 1 次提交
    • S
      perf stat: Add no-aggregation mode to -a · f5b4a9c3
      Stephane Eranian 提交于
      This patch adds a new -A option to perf stat. If specified then perf stat does
      not aggregate counts across all monitored CPUs in system-wide mode, i.e., when
      using -a. This option is not supported in per-thread mode.
      
      Being able to get a per-cpu breakdown is useful to detect imbalances between
      CPUs when running a uniform workload than spans all monitored CPUs.
      
      The second version corrects the missing cpumap[] support, so that it works when
      the -C option is used.
      
      The third version fixes a missing cpumap[] in print_counter() and removes a
      stray patch in builtin-trace.c.
      
      Examples on a 4-way system:
      
      # perf stat -a   -e cycles,instructions -- sleep 1
       Performance counter stats for 'sleep 1':
               9592808135  cycles
               3490380006  instructions             #      0.364 IPC
              1.001584632  seconds time elapsed
      
      # perf stat -a -A -e cycles,instructions -- sleep 1
       Performance counter stats for 'sleep 1':
      CPU0            2398163767  cycles
      CPU1            2398180817  cycles
      CPU2            2398217115  cycles
      CPU3            2398247483  cycles
      CPU0             872282046  instructions             #      0.364 IPC
      CPU1             873481776  instructions             #      0.364 IPC
      CPU2             872638127  instructions             #      0.364 IPC
      CPU3             872437789  instructions             #      0.364 IPC
              1.001556052  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f5b4a9c3
  25. 05 6月, 2010 1 次提交
    • S
      perf tools: Add the ability to specify list of cpus to monitor · c45c6ea2
      Stephane Eranian 提交于
      This patch adds a -C option to stat, record, top to designate a list of CPUs to
      monitor. CPUs can be specified as a comma-separated list or ranges, no space
      allowed.
      
      Examples:
      $ perf record -a -C0-1,4-7 sleep 1
      $ perf top -C0-4
      $ perf stat -a -C1,2,3,4 sleep 1
      
      With perf record in per-thread mode with inherit mode on, samples are collected
      only when the thread runs on the designated CPUs.
      
      The -C option does not turn on system-wide mode automatically.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bff9496.d345d80a.41fe.7b00@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c45c6ea2
  26. 19 5月, 2010 1 次提交
    • S
      perf stat: add perf stat -B to pretty print large numbers · 5af52b51
      Stephane Eranian 提交于
      It is hard to read very large numbers so provide an option to perf stat
      to separate thousands using a separator. The patch leverages the locale
      support of stdio. You need to set your LC_NUMERIC appropriately, for
      instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
      feature. This way existing scripts parsing the output do not need to be
      changed. Here is an example.
      
      $ perf stat noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.347031  task-clock-msecs         #      0.998 CPUs
                       61  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      118  page-faults              #      0.000 M/sec
            4,138,410,900  cycles                   #   2070.917 M/sec  (scaled from 70.01%)
            2,062,650,268  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,057,653,466  branches                 #   1029.678 M/sec  (scaled from 70.01%)
                   40,267  branch-misses            #      0.002 %      (scaled from 30.04%)
            2,055,961,348  cache-references         #   1028.831 M/sec  (scaled from 30.03%)
                   53,725  cache-misses             #      0.027 M/sec  (scaled from 30.02%)
      
              2.001393933  seconds time elapsed
      
      $ perf stat -B  noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.297883  task-clock-msecs         #      0.998 CPUs
                       59  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      119  page-faults              #      0.000 M/sec
            4,131,380,160  cycles                   #   2067.450 M/sec  (scaled from 70.01%)
            2,059,096,507  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,054,681,303  branches                 #   1028.216 M/sec  (scaled from 70.01%)
                   25,650  branch-misses            #      0.001 %      (scaled from 30.05%)
            2,056,283,014  cache-references         #   1029.017 M/sec  (scaled from 30.03%)
                   47,097  cache-misses             #      0.024 M/sec  (scaled from 30.02%)
      
              2.001391016  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af52b51
  27. 14 5月, 2010 1 次提交
    • S
      perf tools: change event inheritance logic in stat and record · 2e6cdf99
      Stephane Eranian 提交于
      By default, event inheritance across fork and pthread_create was on but the -i
      option of stat and record, which enabled inheritance, led to believe it was off
      by default.
      
      This patch fixes this logic by inverting the meaning of the -i option.  By
      default inheritance is on whether you attach to a process (-p), a thread (-t)
      or start a process. If you pass -i, then you turn off inheritance. Turning off
      inheritance if you don't need it, helps limit perf resource usage as well.
      
      The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not
      start the counters.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4bea9d2f.d60ce30a.0b5b.08e1@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e6cdf99
  28. 09 8月, 2009 1 次提交
  29. 23 6月, 2009 1 次提交
  30. 07 6月, 2009 1 次提交
  31. 06 6月, 2009 2 次提交
  32. 04 6月, 2009 1 次提交
    • I
      perf stat: Update help text · 20c84e95
      Ingo Molnar 提交于
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      20c84e95
  33. 30 5月, 2009 1 次提交
    • I
      perf_counter tools: Generate per command manpages (and pdf/html, etc.) · c1c2365a
      Ingo Molnar 提交于
      Import Git's nice .txt => {man/html/pdf} generation machinery.
      
      Fix various errors in the Documentation/perf*.txt description as well.
      
      Also fix a bug in builtin-help: we'd map 'perf help top' to 'perftop'
      if only the 'perf' binary is in the default PATH - confusing the manpage
      logic. I dont fully understand why Git did it this way - but i suppose
      it's a migration artifact from their migration from standalone git-xyz
      commands to 'git xyz' commands. The perf tools were always using the
      modern form so it's not an issue there.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1c2365a