1. 09 5月, 2012 1 次提交
    • D
      perf stat: handle ENXIO error for perf_event_open · 979987a5
      David Ahern 提交于
      perf stat on PPC currently fails to run:
      
      $ perf stat -- sleep 1
        Error: open_counter returned with 6 (No such device or address). /bin/dmesg may provide additional information.
      
        Fatal: Not all events could be opened.
      
      The problem is that until 2.6.37 (behavior changed with commit b0a873eb)
      perf on PPC returns ENXIO when hw_perf_event_init() fails. With this
      patch we get the expected behavior:
      
      $ perf stat -v -- sleep 1
      cycles event is not supported by the kernel.
      stalled-cycles-frontend event is not supported by the kernel.
      stalled-cycles-backend event is not supported by the kernel.
      instructions event is not supported by the kernel.
      branches event is not supported by the kernel.
      branch-misses event is not supported by the kernel.
      
      ...
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1336490956-57145-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      979987a5
  2. 08 5月, 2012 2 次提交
  3. 03 5月, 2012 2 次提交
  4. 12 4月, 2012 1 次提交
  5. 17 3月, 2012 1 次提交
    • N
      perf stat: Fix event grouping on forked task · 4c19ea45
      Namhyung Kim 提交于
      When event group is enabled for forked task (i.e. no target task was
      specified) all events were disabled and marked ->enable_on_exec.
      However they are not counted at all since only group leader will be
      enabled on exec actually. So the result looked like below:
      
       $ ./perf stat --group -- sleep 1
      
       Performance counter stats for 'sleep 1':
      
                0.554926 task-clock                #    0.001 CPUs utilized
           <not counted> context-switches
           <not counted> CPU-migrations
           <not counted> page-faults
           <not counted> cycles
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
           <not counted> instructions
           <not counted> branches
           <not counted> branch-misses
      
             1.001228093 seconds time elapsed
      
      Fix it by disabling group leader only.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.comSigned-off-by: NNamhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4c19ea45
  6. 14 2月, 2012 1 次提交
  7. 07 2月, 2012 2 次提交
  8. 25 1月, 2012 1 次提交
  9. 04 1月, 2012 1 次提交
  10. 06 12月, 2011 1 次提交
    • A
      perf stat: Failure with "Operation not supported" · 38f6ae1e
      Anton Blanchard 提交于
      perf stat is failing on PowerPC:
      
        Error: open_counter returned with 95 (Operation not supported). /bin/dmesg may provide additional information.
      
        Fatal: Not all events could be opened.
      
      commit 370faf1d (perf stat: Fail softly on unsupported events)
      added a check for failure returning ENOENT, but the POWER backend
      returns EOPNOTSUPP. It looks like alpha, blackfin and mips do the
      same.
      
      With the patch applied, things work as expected:
      
       Performance counter stats for '/bin/true':
      
                0.362176 task-clock                #    0.623 CPUs utilized
                       0 context-switches          #    0.000 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                      28 page-faults               #    0.077 M/sec
               1,677,020 cycles                    #    4.630 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 431,220 instructions              #    0.26  insns per cycle
                 101,889 branches                  #  281.325 M/sec
                   4,145 branch-misses             #    4.07% of all branches
      
             0.000581361 seconds time elapsed
      
      Cc: <stable@kernel.org> # 3.0+
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111202093833.5fef7226@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      38f6ae1e
  11. 29 11月, 2011 1 次提交
  12. 28 11月, 2011 1 次提交
  13. 26 10月, 2011 1 次提交
  14. 30 9月, 2011 6 次提交
  15. 18 8月, 2011 2 次提交
  16. 21 7月, 2011 1 次提交
  17. 01 7月, 2011 1 次提交
    • Z
      perf stat: Add noise output for csv mode · 3ae9a34d
      Zhengyu He 提交于
      Previously, when you want perf-stat to output the statistics in
      csv mode, no information of the noise will be printed out.
      
      For example right now we output this --repeat information:
      
       ./perf stat -r3 -x, sleep 1
       1.164789,task-clock
       8,context-switches
       0,CPU-migrations
       219,page-faults
       3337800,cycles
      
      With this patch, the output will be appended with an additional
      entry for the noise value:
      
       ./perf stat -r3 -x, sleep 1
       1.164789,task-clock,3.75%
       8,context-switches,75.00%
       0,CPU-migrations,100.00%
       219,page-faults,0.00%
       3337800,cycles,3.36%
      Signed-off-by: NZhengyu He <zhengyuh@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      3ae9a34d
  18. 03 6月, 2011 1 次提交
    • D
      perf stat: clarify unsupported events from uncounted events · 2cee77c4
      David Ahern 提交于
      perf stat continues running even if the event list contains counters
      that are not supported. The resulting output then contains <not counted>
      for those events which gets confusing as to which events are supported,
      but not counted and which are not supported.
      
      Before:
      
      perf stat -ddd -- sleep 1
      
            Performance counter stats for 'sleep 1':
      
                0.571283 task-clock                #    0.001 CPUs utilized
                       1 context-switches          #    0.002 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     157 page-faults               #    0.275 M/sec
               1,037,707 cycles                    #    1.816 GHz
           <not counted> stalled-cycles-frontend
           <not counted> stalled-cycles-backend
                 654,499 instructions              #    0.63  insns per cycle
                 136,129 branches                  #  238.286 M/sec
           <not counted> branch-misses
           <not counted> L1-dcache-loads
           <not counted> L1-dcache-load-misses
           <not counted> LLC-loads
           <not counted> LLC-load-misses
           <not counted> L1-icache-loads
           <not counted> L1-icache-load-misses
           <not counted> dTLB-loads
           <not counted> dTLB-load-misses
           <not counted> iTLB-loads
           <not counted> iTLB-load-misses
           <not counted> L1-dcache-prefetches
           <not counted> L1-dcache-prefetch-misses
      
             1.001004836 seconds time elapsed
      
      After:
      
      perf stat -ddd -- sleep 1
      
       Performance counter stats for 'sleep 1':
      
                1.350326 task-clock                #    0.001 CPUs utilized
                       2 context-switches          #    0.001 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     157 page-faults               #    0.116 M/sec
                  11,986 cycles                    #    0.009 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 496,986 instructions              #   41.46  insns per cycle
                 138,065 branches                  #  102.246 M/sec
                   7,245 branch-misses             #    5.25% of all branches
           <not counted> L1-dcache-loads
           <not counted> L1-dcache-load-misses
           <not counted> LLC-loads
           <not counted> LLC-load-misses
           <not counted> L1-icache-loads
           <not counted> L1-icache-load-misses
           <not counted> dTLB-loads
           <not counted> dTLB-load-misses
           <not counted> iTLB-loads
           <not counted> iTLB-load-misses
           <not counted> L1-dcache-prefetches
         <not supported> L1-dcache-prefetch-misses
      
             1.002397333 seconds time elapsed
      
      v1->v2:
      changed supported type from int to bool
      
      v2->v3
      fixed vertical alignment of new struct element
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2cee77c4
  19. 19 5月, 2011 2 次提交
    • I
      perf stat: Add more cache-miss percentage printouts · c3305257
      Ingo Molnar 提交于
      Print out the cache-miss percentage as well if the cache refs were
      collected, for all the generic cache event types.
      
      Before:
      
         11,103,723,230 dTLB-loads                #  622.471 M/sec                    ( +-  0.30% )
             87,065,337 dTLB-load-misses          #    4.881 M/sec                    ( +-  0.90% )
      
      After:
      
         11,353,713,242 dTLB-loads                #  626.020 M/sec                    ( +-  0.35% )
            113,393,472 dTLB-load-misses          #    1.00% of all dTLB cache hits   ( +-  0.49% )
      
      Also ASCII color highlight too high percentages, them when it's executed on the console.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c3305257
    • I
      perf stat: Add -d -d and -d -d -d options to show more CPU events · 2cba3ffb
      Ingo Molnar 提交于
      Print even more detailed statistics if requested via perf stat -d:
      
             -d:          detailed events, L1 and LLC data cache
          -d -d:     more detailed events, dTLB and iTLB events
       -d -d -d:     very detailed events, adding prefetch events
      
      Full output looks like this now:
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
             1703.674707 task-clock                #    8.709 CPUs utilized            ( +-  4.19% )
                  49,068 context-switches          #    0.029 M/sec                    ( +- 16.66% )
                   8,303 CPU-migrations            #    0.005 M/sec                    ( +- 24.90% )
                  17,397 page-faults               #    0.010 M/sec                    ( +-  0.46% )
           2,345,389,239 cycles                    #    1.377 GHz                      ( +-  4.61% ) [55.90%]
           1,884,503,527 stalled-cycles-frontend   #   80.35% frontend cycles idle     ( +-  5.67% ) [50.39%]
             743,919,737 stalled-cycles-backend    #   31.72% backend  cycles idle     ( +-  8.75% ) [49.91%]
           1,314,416,379 instructions              #    0.56  insns per cycle
                                                   #    1.43  stalled cycles per insn  ( +-  2.53% ) [60.87%]
             272,592,567 branches                  #  160.003 M/sec                    ( +-  1.74% ) [56.56%]
               3,794,846 branch-misses             #    1.39% of all branches          ( +-  6.59% ) [58.50%]
             449,982,778 L1-dcache-loads           #  264.125 M/sec                    ( +-  2.47% ) [49.88%]
              22,404,961 L1-dcache-load-misses     #    4.98% of all L1-dcache hits    ( +-  6.08% ) [55.05%]
               6,204,750 LLC-loads                 #    3.642 M/sec                    ( +-  8.91% ) [43.75%]
               1,837,411 LLC-load-misses           #    1.078 M/sec                    ( +-  7.27% ) [12.07%]
             411,440,421 L1-icache-loads           #  241.502 M/sec                    ( +-  5.60% ) [36.52%]
              27,556,832 L1-icache-load-misses     #   16.175 M/sec                    ( +-  7.46% ) [46.72%]
             464,067,627 dTLB-loads                #  272.392 M/sec                    ( +-  4.46% ) [54.17%]
              10,765,648 dTLB-load-misses          #    6.319 M/sec                    ( +-  3.18% ) [48.68%]
           1,273,080,386 iTLB-loads                #  747.256 M/sec                    ( +-  3.38% ) [47.53%]
                 117,481 iTLB-load-misses          #    0.069 M/sec                    ( +- 14.99% ) [47.01%]
               4,590,653 L1-dcache-prefetches      #    2.695 M/sec                    ( +-  4.49% ) [46.19%]
               1,712,660 L1-dcache-prefetch-misses #    1.005 M/sec                    ( +-  3.75% ) [44.82%]
      
              0.195622057  seconds time elapsed  ( +-  6.84% )
      
      Also clean up the attribute construction code to be appending, and factor
      it out into add_default_attributes().
      
      Tweak the coverage percentage printout a bit, so that it's easier to view it
      alongside the +- sttddev colum.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2cba3ffb
  20. 30 4月, 2011 1 次提交
  21. 29 4月, 2011 5 次提交
  22. 28 4月, 2011 2 次提交
    • I
      perf stat: Fix compatibility behavior · ede70290
      Ingo Molnar 提交于
      Instead of failing on an unknown event, when new perf stat is run on
      older kernels:
      
        $ ./perf stat true
        Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
        may provide additional information.
      
        Fatal: Not all events could be opened.
      
      Just ignore EINVAL and ENOSYS, we'll print the results as not counted:
      
       Performance counter stats for 'true':
      
                0.239483 task-clock               #    0.493 CPUs utilized
                       0 context-switches         #    0.000 M/sec
                       0 CPU-migrations           #    0.000 M/sec
                      86 page-faults              #    0.359 M/sec
                 704,766 cycles                   #    2.943 GHz
           <not counted> stalled-cycles
                 381,961 instructions             #    0.54  insns per cycle
                  69,626 branches                 #  290.735 M/sec
                   4,594 branch-misses            #    6.60% of all branches
      
              0.000485883  seconds time elapsed
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ede70290
    • I
      perf stat: Add --sync/-S option · f9cef0a9
      Ingo Molnar 提交于
      --sync will tell perf stat to run sync() before starting a command.
      
      This allows IO-heavy tests to be used with --repeat, without one
      iteration impacting the other.
      
      Elapsed time will stabilize for example:
      
        before:        3.971525714  seconds time elapsed  ( +-  8.56% )
        after:         3.211098537  seconds time elapsed  ( +-  1.52% )
      
      So measurements will be more accurate.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f9cef0a9
  23. 27 4月, 2011 3 次提交
    • I
      perf stat: Fix printout vertical alignment · 9ceb1c3d
      Ingo Molnar 提交于
      Before:
      
       |
       | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
       |
       |        71,321,607 instructions:u           #    0.42  insns per cycle  ( +-  0.00% )
       |       168,040,009 cycles:u                 #    0.000 GHz                      ( +-  0.81% )
       |
       |        1.468002368  seconds time elapsed  ( +-  1.33% )
       |
      
      After:
      
       |
       | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
       |
       |        71,321,607 instructions:u           #    0.42  insns per cycle          ( +-  0.00% )
       |       168,040,009 cycles:u                 #    0.000 GHz                      ( +-  0.81% )
       |
       |        1.468002368  seconds time elapsed  ( +-  1.33% )
       |
      
      The last column (stddev noise) is properly aligned, vertically.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn0dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      9ceb1c3d
    • I
      perf stat: Add -d/--detailed flag to run with a lot of events · c6264def
      Ingo Molnar 提交于
      Add the new -d/--detailed flag, which generates a pretty detailed event list:
      
       Performance counter stats for './hackbench 10' (10 runs):
      
             1514.287888 task-clock               #   10.897 CPUs utilized            ( +-  3.05% )
                  39,698 context-switches         #    0.026 M/sec                    ( +- 12.19% )
                   8,147 CPU-migrations           #    0.005 M/sec                    ( +- 16.55% )
                  17,918 page-faults              #    0.012 M/sec                    ( +-  0.37% )
           2,944,504,050 cycles                   #    1.944 GHz                      ( +-  3.89% )  (32.60%)
           1,043,971,283 stalled-cycles           #   35.45% of all cycles are idle   ( +-  5.22% )  (44.48%)
           1,655,906,768 instructions             #    0.56  insns per cycle
                                                  #    0.63  stalled cycles per insn  ( +-  1.95% )  (55.09%)
             338,832,373 branches                 #  223.757 M/sec                    ( +-  1.96% )  (64.47%)
               3,892,416 branch-misses            #    1.15% of all branches          ( +-  5.49% )  (73.12%)
             606,410,482 L1-dcache-loads          #  400.459 M/sec                    ( +-  1.29% )  (71.21%)
              31,204,395 L1-dcache-load-misses    #    5.15% of all L1-dcache hits    ( +-  3.04% )  (60.43%)
               3,922,751 LLC-loads                #    2.590 M/sec                    ( +-  6.80% )  (46.87%)
               5,037,288 LLC-load-misses          #    3.327 M/sec                    ( +-  3.56% )  (13.00%)
      
              0.138966828  seconds time elapsed  ( +-  4.11% )
      
      This can be used "at a glance" for narrower analysis.
      
      -d can also be used in addition to other -e events, to further expand an event list.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-cxs98quixs3qyvdqx3goojc4@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c6264def
    • I
      perf stat: Print out miss/hit ratio for L1 data-cache events · 8bb6c79f
      Ingo Molnar 提交于
      Print out this kind of l1-dcache-misses percentage:
      
       Performance counter stats for './bw_tcp localhost':
      
          29,956,262,201 cycles                   #    3.002 GHz                      (scaled from 85.14%)
           8,255,209,558 stalled-cycles           #   27.56% of all cycles are idle   (scaled from 86.56%)
           1,206,130,308 l1-dcache-misses         #   40.49% of all L1-dcache hits    (scaled from 86.30%)
           2,978,756,779 l1-dcache-refs           #  298.512 M/sec                    (scaled from 70.02%)
           8,861,956,159 instructions             #    0.30  insns per cycle
                                                  #    0.93  stalled cycles per insn  (scaled from 84.27%)
           1,644,306,068 branches                 #  164.782 M/sec                    (scaled from 86.43%)
              74,778,443 branch-misses            #    4.55% of all branches          (scaled from 70.69%)
             9978.695711 task-clock               #    0.693 CPUs utilized
      
             14.404347983  seconds time elapsed
      
      And color the result depending on the severity of cache-trashing.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-54gmz0zymaid84zcs7joq02p@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      8bb6c79f