1. 17 3月, 2012 1 次提交
    • N
      perf stat: Fix event grouping on forked task · 4c19ea45
      Namhyung Kim 提交于
      When event group is enabled for forked task (i.e. no target task was
      specified) all events were disabled and marked ->enable_on_exec.
      However they are not counted at all since only group leader will be
      enabled on exec actually. So the result looked like below:
      
       $ ./perf stat --group -- sleep 1
      
       Performance counter stats for 'sleep 1':
      
                0.554926 task-clock                #    0.001 CPUs utilized
           <not counted> context-switches
           <not counted> CPU-migrations
           <not counted> page-faults
           <not counted> cycles
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
           <not counted> instructions
           <not counted> branches
           <not counted> branch-misses
      
             1.001228093 seconds time elapsed
      
      Fix it by disabling group leader only.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.comSigned-off-by: NNamhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4c19ea45
  2. 14 2月, 2012 1 次提交
  3. 07 2月, 2012 2 次提交
  4. 25 1月, 2012 1 次提交
  5. 04 1月, 2012 1 次提交
  6. 06 12月, 2011 1 次提交
    • A
      perf stat: Failure with "Operation not supported" · 38f6ae1e
      Anton Blanchard 提交于
      perf stat is failing on PowerPC:
      
        Error: open_counter returned with 95 (Operation not supported). /bin/dmesg may provide additional information.
      
        Fatal: Not all events could be opened.
      
      commit 370faf1d (perf stat: Fail softly on unsupported events)
      added a check for failure returning ENOENT, but the POWER backend
      returns EOPNOTSUPP. It looks like alpha, blackfin and mips do the
      same.
      
      With the patch applied, things work as expected:
      
       Performance counter stats for '/bin/true':
      
                0.362176 task-clock                #    0.623 CPUs utilized
                       0 context-switches          #    0.000 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                      28 page-faults               #    0.077 M/sec
               1,677,020 cycles                    #    4.630 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 431,220 instructions              #    0.26  insns per cycle
                 101,889 branches                  #  281.325 M/sec
                   4,145 branch-misses             #    4.07% of all branches
      
             0.000581361 seconds time elapsed
      
      Cc: <stable@kernel.org> # 3.0+
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111202093833.5fef7226@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      38f6ae1e
  7. 29 11月, 2011 1 次提交
  8. 28 11月, 2011 1 次提交
  9. 26 10月, 2011 1 次提交
  10. 30 9月, 2011 6 次提交
  11. 18 8月, 2011 2 次提交
  12. 21 7月, 2011 1 次提交
  13. 01 7月, 2011 1 次提交
    • Z
      perf stat: Add noise output for csv mode · 3ae9a34d
      Zhengyu He 提交于
      Previously, when you want perf-stat to output the statistics in
      csv mode, no information of the noise will be printed out.
      
      For example right now we output this --repeat information:
      
       ./perf stat -r3 -x, sleep 1
       1.164789,task-clock
       8,context-switches
       0,CPU-migrations
       219,page-faults
       3337800,cycles
      
      With this patch, the output will be appended with an additional
      entry for the noise value:
      
       ./perf stat -r3 -x, sleep 1
       1.164789,task-clock,3.75%
       8,context-switches,75.00%
       0,CPU-migrations,100.00%
       219,page-faults,0.00%
       3337800,cycles,3.36%
      Signed-off-by: NZhengyu He <zhengyuh@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      3ae9a34d
  14. 03 6月, 2011 1 次提交
    • D
      perf stat: clarify unsupported events from uncounted events · 2cee77c4
      David Ahern 提交于
      perf stat continues running even if the event list contains counters
      that are not supported. The resulting output then contains <not counted>
      for those events which gets confusing as to which events are supported,
      but not counted and which are not supported.
      
      Before:
      
      perf stat -ddd -- sleep 1
      
            Performance counter stats for 'sleep 1':
      
                0.571283 task-clock                #    0.001 CPUs utilized
                       1 context-switches          #    0.002 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     157 page-faults               #    0.275 M/sec
               1,037,707 cycles                    #    1.816 GHz
           <not counted> stalled-cycles-frontend
           <not counted> stalled-cycles-backend
                 654,499 instructions              #    0.63  insns per cycle
                 136,129 branches                  #  238.286 M/sec
           <not counted> branch-misses
           <not counted> L1-dcache-loads
           <not counted> L1-dcache-load-misses
           <not counted> LLC-loads
           <not counted> LLC-load-misses
           <not counted> L1-icache-loads
           <not counted> L1-icache-load-misses
           <not counted> dTLB-loads
           <not counted> dTLB-load-misses
           <not counted> iTLB-loads
           <not counted> iTLB-load-misses
           <not counted> L1-dcache-prefetches
           <not counted> L1-dcache-prefetch-misses
      
             1.001004836 seconds time elapsed
      
      After:
      
      perf stat -ddd -- sleep 1
      
       Performance counter stats for 'sleep 1':
      
                1.350326 task-clock                #    0.001 CPUs utilized
                       2 context-switches          #    0.001 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     157 page-faults               #    0.116 M/sec
                  11,986 cycles                    #    0.009 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 496,986 instructions              #   41.46  insns per cycle
                 138,065 branches                  #  102.246 M/sec
                   7,245 branch-misses             #    5.25% of all branches
           <not counted> L1-dcache-loads
           <not counted> L1-dcache-load-misses
           <not counted> LLC-loads
           <not counted> LLC-load-misses
           <not counted> L1-icache-loads
           <not counted> L1-icache-load-misses
           <not counted> dTLB-loads
           <not counted> dTLB-load-misses
           <not counted> iTLB-loads
           <not counted> iTLB-load-misses
           <not counted> L1-dcache-prefetches
         <not supported> L1-dcache-prefetch-misses
      
             1.002397333 seconds time elapsed
      
      v1->v2:
      changed supported type from int to bool
      
      v2->v3
      fixed vertical alignment of new struct element
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2cee77c4
  15. 19 5月, 2011 2 次提交
    • I
      perf stat: Add more cache-miss percentage printouts · c3305257
      Ingo Molnar 提交于
      Print out the cache-miss percentage as well if the cache refs were
      collected, for all the generic cache event types.
      
      Before:
      
         11,103,723,230 dTLB-loads                #  622.471 M/sec                    ( +-  0.30% )
             87,065,337 dTLB-load-misses          #    4.881 M/sec                    ( +-  0.90% )
      
      After:
      
         11,353,713,242 dTLB-loads                #  626.020 M/sec                    ( +-  0.35% )
            113,393,472 dTLB-load-misses          #    1.00% of all dTLB cache hits   ( +-  0.49% )
      
      Also ASCII color highlight too high percentages, them when it's executed on the console.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c3305257
    • I
      perf stat: Add -d -d and -d -d -d options to show more CPU events · 2cba3ffb
      Ingo Molnar 提交于
      Print even more detailed statistics if requested via perf stat -d:
      
             -d:          detailed events, L1 and LLC data cache
          -d -d:     more detailed events, dTLB and iTLB events
       -d -d -d:     very detailed events, adding prefetch events
      
      Full output looks like this now:
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
             1703.674707 task-clock                #    8.709 CPUs utilized            ( +-  4.19% )
                  49,068 context-switches          #    0.029 M/sec                    ( +- 16.66% )
                   8,303 CPU-migrations            #    0.005 M/sec                    ( +- 24.90% )
                  17,397 page-faults               #    0.010 M/sec                    ( +-  0.46% )
           2,345,389,239 cycles                    #    1.377 GHz                      ( +-  4.61% ) [55.90%]
           1,884,503,527 stalled-cycles-frontend   #   80.35% frontend cycles idle     ( +-  5.67% ) [50.39%]
             743,919,737 stalled-cycles-backend    #   31.72% backend  cycles idle     ( +-  8.75% ) [49.91%]
           1,314,416,379 instructions              #    0.56  insns per cycle
                                                   #    1.43  stalled cycles per insn  ( +-  2.53% ) [60.87%]
             272,592,567 branches                  #  160.003 M/sec                    ( +-  1.74% ) [56.56%]
               3,794,846 branch-misses             #    1.39% of all branches          ( +-  6.59% ) [58.50%]
             449,982,778 L1-dcache-loads           #  264.125 M/sec                    ( +-  2.47% ) [49.88%]
              22,404,961 L1-dcache-load-misses     #    4.98% of all L1-dcache hits    ( +-  6.08% ) [55.05%]
               6,204,750 LLC-loads                 #    3.642 M/sec                    ( +-  8.91% ) [43.75%]
               1,837,411 LLC-load-misses           #    1.078 M/sec                    ( +-  7.27% ) [12.07%]
             411,440,421 L1-icache-loads           #  241.502 M/sec                    ( +-  5.60% ) [36.52%]
              27,556,832 L1-icache-load-misses     #   16.175 M/sec                    ( +-  7.46% ) [46.72%]
             464,067,627 dTLB-loads                #  272.392 M/sec                    ( +-  4.46% ) [54.17%]
              10,765,648 dTLB-load-misses          #    6.319 M/sec                    ( +-  3.18% ) [48.68%]
           1,273,080,386 iTLB-loads                #  747.256 M/sec                    ( +-  3.38% ) [47.53%]
                 117,481 iTLB-load-misses          #    0.069 M/sec                    ( +- 14.99% ) [47.01%]
               4,590,653 L1-dcache-prefetches      #    2.695 M/sec                    ( +-  4.49% ) [46.19%]
               1,712,660 L1-dcache-prefetch-misses #    1.005 M/sec                    ( +-  3.75% ) [44.82%]
      
              0.195622057  seconds time elapsed  ( +-  6.84% )
      
      Also clean up the attribute construction code to be appending, and factor
      it out into add_default_attributes().
      
      Tweak the coverage percentage printout a bit, so that it's easier to view it
      alongside the +- sttddev colum.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2cba3ffb
  16. 30 4月, 2011 1 次提交
  17. 29 4月, 2011 5 次提交
  18. 28 4月, 2011 2 次提交
    • I
      perf stat: Fix compatibility behavior · ede70290
      Ingo Molnar 提交于
      Instead of failing on an unknown event, when new perf stat is run on
      older kernels:
      
        $ ./perf stat true
        Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
        may provide additional information.
      
        Fatal: Not all events could be opened.
      
      Just ignore EINVAL and ENOSYS, we'll print the results as not counted:
      
       Performance counter stats for 'true':
      
                0.239483 task-clock               #    0.493 CPUs utilized
                       0 context-switches         #    0.000 M/sec
                       0 CPU-migrations           #    0.000 M/sec
                      86 page-faults              #    0.359 M/sec
                 704,766 cycles                   #    2.943 GHz
           <not counted> stalled-cycles
                 381,961 instructions             #    0.54  insns per cycle
                  69,626 branches                 #  290.735 M/sec
                   4,594 branch-misses            #    6.60% of all branches
      
              0.000485883  seconds time elapsed
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ede70290
    • I
      perf stat: Add --sync/-S option · f9cef0a9
      Ingo Molnar 提交于
      --sync will tell perf stat to run sync() before starting a command.
      
      This allows IO-heavy tests to be used with --repeat, without one
      iteration impacting the other.
      
      Elapsed time will stabilize for example:
      
        before:        3.971525714  seconds time elapsed  ( +-  8.56% )
        after:         3.211098537  seconds time elapsed  ( +-  1.52% )
      
      So measurements will be more accurate.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f9cef0a9
  19. 27 4月, 2011 9 次提交