1. 31 5月, 2012 1 次提交
    • A
      perf stat: Initialize default events wrt exclude_{guest,host} · 79695e1b
      Arnaldo Carvalho de Melo 提交于
      When no event is specified the tools use perf_evlist__add_default(), that will
      call event_attr_init to initialize the KVM exclusion bits.
      
      When the change was made to the tools so that by default guest samples would be
      excluded, the changes were made just to the parsing routines and to
      perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far
      just by perf stat to add multiple events, according to the level of detail
      specified.
      
      Recently the tools were changed to reconstruct the event name from all the
      details in perf_event_attr, not just from .type and .config, but taking into
      account all the feature bits (.exclude_{guest,host,user,kernel,etc},
      .precise_ip, etc).
      
      That is when we noticed that the default for perf stat wasn't the one for the
      rest of the tools, i.e. the .exclude_guest bit wasn't being set.
      
      I.e. the default, that doesn't call event_attr_init was showing the :HG
      modifier:
      
        $ perf stat usleep 1
      
         Performance counter stats for 'usleep 1':
      
                  0.942119 task-clock                #    0.454 CPUs utilized
                         1 context-switches          #    0.001 M/sec
                         0 CPU-migrations            #    0.000 K/sec
                       126 page-faults               #    0.134 M/sec
                   693,193 cycles:HG                 #    0.736 GHz                     [40.11%]
                   407,461 stalled-cycles-frontend:HG #   58.78% frontend cycles idle    [72.29%]
                   365,403 stalled-cycles-backend:HG #   52.71% backend  cycles idle
                   465,982 instructions:HG           #    0.67  insns per cycle
                                                     #    0.87  stalled cycles per insn
                    89,760 branches:HG               #   95.275 M/sec
                     6,178 branch-misses:HG          #    6.88% of all branches
      
               0.002077228 seconds time elapsed
      
      While if one explicitely specifies the same events, which will make the parsing code
      to be called and thus event_attr_init is called:
      
        $ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1
      
         Performance counter stats for 'usleep 1':
      
                  1.040349 task-clock                #    0.500 CPUs utilized
                         2 context-switches          #    0.002 M/sec
                         0 CPU-migrations            #    0.000 K/sec
                       127 page-faults               #    0.122 M/sec
                   587,966 cycles                    #    0.565 GHz                     [13.18%]
                   459,167 stalled-cycles-frontend   #   78.09% frontend cycles idle
                   390,249 stalled-cycles-backend    #   66.37% backend  cycles idle
                   504,006 instructions              #    0.86  insns per cycle
                                                     #    0.91  stalled cycles per insn
                    96,455 branches                  #   92.714 M/sec
                     6,522 branch-misses             #    6.76% of all branches         [96.12%]
      
               0.002078681 seconds time elapsed
      
      Fix it by introducing a perf_evlist__add_default_attrs method that will call
      evlist_attr_init in all the perf_event_attr entries before adding the events.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      79695e1b
  2. 16 5月, 2012 1 次提交
  3. 10 5月, 2012 1 次提交
    • D
      perf stat: handle ENXIO error for perf_event_open · 20d23aaa
      David Ahern 提交于
      perf stat on PPC currently fails to run:
      
      $ perf stat -- sleep 1
        Error: open_counter returned with 6 (No such device or address). /bin/dmesg may provide additional information.
      
        Fatal: Not all events could be opened.
      
      The problem is that until 2.6.37 (behavior changed with commit b0a873eb)
      perf on PPC returns ENXIO when hw_perf_event_init() fails. With this
      patch we get the expected behavior:
      
      $ perf stat -v -- sleep 1
      cycles event is not supported by the kernel.
      stalled-cycles-frontend event is not supported by the kernel.
      stalled-cycles-backend event is not supported by the kernel.
      instructions event is not supported by the kernel.
      branches event is not supported by the kernel.
      branch-misses event is not supported by the kernel.
      
      ...
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1336490956-57145-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      20d23aaa
  4. 09 5月, 2012 1 次提交
    • D
      perf stat: handle ENXIO error for perf_event_open · 979987a5
      David Ahern 提交于
      perf stat on PPC currently fails to run:
      
      $ perf stat -- sleep 1
        Error: open_counter returned with 6 (No such device or address). /bin/dmesg may provide additional information.
      
        Fatal: Not all events could be opened.
      
      The problem is that until 2.6.37 (behavior changed with commit b0a873eb)
      perf on PPC returns ENXIO when hw_perf_event_init() fails. With this
      patch we get the expected behavior:
      
      $ perf stat -v -- sleep 1
      cycles event is not supported by the kernel.
      stalled-cycles-frontend event is not supported by the kernel.
      stalled-cycles-backend event is not supported by the kernel.
      instructions event is not supported by the kernel.
      branches event is not supported by the kernel.
      branch-misses event is not supported by the kernel.
      
      ...
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1336490956-57145-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      979987a5
  5. 08 5月, 2012 2 次提交
  6. 03 5月, 2012 2 次提交
  7. 02 5月, 2012 1 次提交
    • S
      perf stat: Fix case where guest/host monitoring is not supported by kernel · 5622c07b
      Stephane Eranian 提交于
      By default, perf stat sets exclude_guest = 1. But when you run perf on a
      kernel which does not support  host/guest filtering, then you get an
      error saying the event in unsupported. This comes from the fact that
      when the perf_event_attr struct passed by the user is larger than the
      one known to the kernel there is safety check which ensures that all
      unknown bits are zero. But here, exclude_guest is 1 (part of the unknown
      bits) and thus the perf_event_open() syscall return EINVAL.
      
      To my surprise, running perf record on the same kernel did not exhibit
      the problem. The reason is that perf record handles the problem by
      catching the error and retrying with guest/host excludes set to zero.
      For some reason, this was not done with perf stat. This patch fixes this
      problem.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Link: http://lkml.kernel.org/r/20120427124538.GA7230@quadSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5622c07b
  8. 12 4月, 2012 1 次提交
  9. 17 3月, 2012 1 次提交
    • N
      perf stat: Fix event grouping on forked task · 4c19ea45
      Namhyung Kim 提交于
      When event group is enabled for forked task (i.e. no target task was
      specified) all events were disabled and marked ->enable_on_exec.
      However they are not counted at all since only group leader will be
      enabled on exec actually. So the result looked like below:
      
       $ ./perf stat --group -- sleep 1
      
       Performance counter stats for 'sleep 1':
      
                0.554926 task-clock                #    0.001 CPUs utilized
           <not counted> context-switches
           <not counted> CPU-migrations
           <not counted> page-faults
           <not counted> cycles
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
           <not counted> instructions
           <not counted> branches
           <not counted> branch-misses
      
             1.001228093 seconds time elapsed
      
      Fix it by disabling group leader only.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.comSigned-off-by: NNamhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4c19ea45
  10. 14 2月, 2012 1 次提交
  11. 07 2月, 2012 2 次提交
  12. 25 1月, 2012 1 次提交
  13. 04 1月, 2012 1 次提交
  14. 06 12月, 2011 1 次提交
    • A
      perf stat: Failure with "Operation not supported" · 38f6ae1e
      Anton Blanchard 提交于
      perf stat is failing on PowerPC:
      
        Error: open_counter returned with 95 (Operation not supported). /bin/dmesg may provide additional information.
      
        Fatal: Not all events could be opened.
      
      commit 370faf1d (perf stat: Fail softly on unsupported events)
      added a check for failure returning ENOENT, but the POWER backend
      returns EOPNOTSUPP. It looks like alpha, blackfin and mips do the
      same.
      
      With the patch applied, things work as expected:
      
       Performance counter stats for '/bin/true':
      
                0.362176 task-clock                #    0.623 CPUs utilized
                       0 context-switches          #    0.000 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                      28 page-faults               #    0.077 M/sec
               1,677,020 cycles                    #    4.630 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 431,220 instructions              #    0.26  insns per cycle
                 101,889 branches                  #  281.325 M/sec
                   4,145 branch-misses             #    4.07% of all branches
      
             0.000581361 seconds time elapsed
      
      Cc: <stable@kernel.org> # 3.0+
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111202093833.5fef7226@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      38f6ae1e
  15. 29 11月, 2011 1 次提交
  16. 28 11月, 2011 1 次提交
  17. 26 10月, 2011 1 次提交
  18. 30 9月, 2011 6 次提交
  19. 18 8月, 2011 2 次提交
  20. 21 7月, 2011 1 次提交
  21. 01 7月, 2011 1 次提交
    • Z
      perf stat: Add noise output for csv mode · 3ae9a34d
      Zhengyu He 提交于
      Previously, when you want perf-stat to output the statistics in
      csv mode, no information of the noise will be printed out.
      
      For example right now we output this --repeat information:
      
       ./perf stat -r3 -x, sleep 1
       1.164789,task-clock
       8,context-switches
       0,CPU-migrations
       219,page-faults
       3337800,cycles
      
      With this patch, the output will be appended with an additional
      entry for the noise value:
      
       ./perf stat -r3 -x, sleep 1
       1.164789,task-clock,3.75%
       8,context-switches,75.00%
       0,CPU-migrations,100.00%
       219,page-faults,0.00%
       3337800,cycles,3.36%
      Signed-off-by: NZhengyu He <zhengyuh@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      3ae9a34d
  22. 03 6月, 2011 1 次提交
    • D
      perf stat: clarify unsupported events from uncounted events · 2cee77c4
      David Ahern 提交于
      perf stat continues running even if the event list contains counters
      that are not supported. The resulting output then contains <not counted>
      for those events which gets confusing as to which events are supported,
      but not counted and which are not supported.
      
      Before:
      
      perf stat -ddd -- sleep 1
      
            Performance counter stats for 'sleep 1':
      
                0.571283 task-clock                #    0.001 CPUs utilized
                       1 context-switches          #    0.002 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     157 page-faults               #    0.275 M/sec
               1,037,707 cycles                    #    1.816 GHz
           <not counted> stalled-cycles-frontend
           <not counted> stalled-cycles-backend
                 654,499 instructions              #    0.63  insns per cycle
                 136,129 branches                  #  238.286 M/sec
           <not counted> branch-misses
           <not counted> L1-dcache-loads
           <not counted> L1-dcache-load-misses
           <not counted> LLC-loads
           <not counted> LLC-load-misses
           <not counted> L1-icache-loads
           <not counted> L1-icache-load-misses
           <not counted> dTLB-loads
           <not counted> dTLB-load-misses
           <not counted> iTLB-loads
           <not counted> iTLB-load-misses
           <not counted> L1-dcache-prefetches
           <not counted> L1-dcache-prefetch-misses
      
             1.001004836 seconds time elapsed
      
      After:
      
      perf stat -ddd -- sleep 1
      
       Performance counter stats for 'sleep 1':
      
                1.350326 task-clock                #    0.001 CPUs utilized
                       2 context-switches          #    0.001 M/sec
                       0 CPU-migrations            #    0.000 M/sec
                     157 page-faults               #    0.116 M/sec
                  11,986 cycles                    #    0.009 GHz
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
                 496,986 instructions              #   41.46  insns per cycle
                 138,065 branches                  #  102.246 M/sec
                   7,245 branch-misses             #    5.25% of all branches
           <not counted> L1-dcache-loads
           <not counted> L1-dcache-load-misses
           <not counted> LLC-loads
           <not counted> LLC-load-misses
           <not counted> L1-icache-loads
           <not counted> L1-icache-load-misses
           <not counted> dTLB-loads
           <not counted> dTLB-load-misses
           <not counted> iTLB-loads
           <not counted> iTLB-load-misses
           <not counted> L1-dcache-prefetches
         <not supported> L1-dcache-prefetch-misses
      
             1.002397333 seconds time elapsed
      
      v1->v2:
      changed supported type from int to bool
      
      v2->v3
      fixed vertical alignment of new struct element
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2cee77c4
  23. 19 5月, 2011 2 次提交
    • I
      perf stat: Add more cache-miss percentage printouts · c3305257
      Ingo Molnar 提交于
      Print out the cache-miss percentage as well if the cache refs were
      collected, for all the generic cache event types.
      
      Before:
      
         11,103,723,230 dTLB-loads                #  622.471 M/sec                    ( +-  0.30% )
             87,065,337 dTLB-load-misses          #    4.881 M/sec                    ( +-  0.90% )
      
      After:
      
         11,353,713,242 dTLB-loads                #  626.020 M/sec                    ( +-  0.35% )
            113,393,472 dTLB-load-misses          #    1.00% of all dTLB cache hits   ( +-  0.49% )
      
      Also ASCII color highlight too high percentages, them when it's executed on the console.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c3305257
    • I
      perf stat: Add -d -d and -d -d -d options to show more CPU events · 2cba3ffb
      Ingo Molnar 提交于
      Print even more detailed statistics if requested via perf stat -d:
      
             -d:          detailed events, L1 and LLC data cache
          -d -d:     more detailed events, dTLB and iTLB events
       -d -d -d:     very detailed events, adding prefetch events
      
      Full output looks like this now:
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
             1703.674707 task-clock                #    8.709 CPUs utilized            ( +-  4.19% )
                  49,068 context-switches          #    0.029 M/sec                    ( +- 16.66% )
                   8,303 CPU-migrations            #    0.005 M/sec                    ( +- 24.90% )
                  17,397 page-faults               #    0.010 M/sec                    ( +-  0.46% )
           2,345,389,239 cycles                    #    1.377 GHz                      ( +-  4.61% ) [55.90%]
           1,884,503,527 stalled-cycles-frontend   #   80.35% frontend cycles idle     ( +-  5.67% ) [50.39%]
             743,919,737 stalled-cycles-backend    #   31.72% backend  cycles idle     ( +-  8.75% ) [49.91%]
           1,314,416,379 instructions              #    0.56  insns per cycle
                                                   #    1.43  stalled cycles per insn  ( +-  2.53% ) [60.87%]
             272,592,567 branches                  #  160.003 M/sec                    ( +-  1.74% ) [56.56%]
               3,794,846 branch-misses             #    1.39% of all branches          ( +-  6.59% ) [58.50%]
             449,982,778 L1-dcache-loads           #  264.125 M/sec                    ( +-  2.47% ) [49.88%]
              22,404,961 L1-dcache-load-misses     #    4.98% of all L1-dcache hits    ( +-  6.08% ) [55.05%]
               6,204,750 LLC-loads                 #    3.642 M/sec                    ( +-  8.91% ) [43.75%]
               1,837,411 LLC-load-misses           #    1.078 M/sec                    ( +-  7.27% ) [12.07%]
             411,440,421 L1-icache-loads           #  241.502 M/sec                    ( +-  5.60% ) [36.52%]
              27,556,832 L1-icache-load-misses     #   16.175 M/sec                    ( +-  7.46% ) [46.72%]
             464,067,627 dTLB-loads                #  272.392 M/sec                    ( +-  4.46% ) [54.17%]
              10,765,648 dTLB-load-misses          #    6.319 M/sec                    ( +-  3.18% ) [48.68%]
           1,273,080,386 iTLB-loads                #  747.256 M/sec                    ( +-  3.38% ) [47.53%]
                 117,481 iTLB-load-misses          #    0.069 M/sec                    ( +- 14.99% ) [47.01%]
               4,590,653 L1-dcache-prefetches      #    2.695 M/sec                    ( +-  4.49% ) [46.19%]
               1,712,660 L1-dcache-prefetch-misses #    1.005 M/sec                    ( +-  3.75% ) [44.82%]
      
              0.195622057  seconds time elapsed  ( +-  6.84% )
      
      Also clean up the attribute construction code to be appending, and factor
      it out into add_default_attributes().
      
      Tweak the coverage percentage printout a bit, so that it's easier to view it
      alongside the +- sttddev colum.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2cba3ffb
  24. 30 4月, 2011 1 次提交
  25. 29 4月, 2011 5 次提交
  26. 28 4月, 2011 1 次提交
    • I
      perf stat: Fix compatibility behavior · ede70290
      Ingo Molnar 提交于
      Instead of failing on an unknown event, when new perf stat is run on
      older kernels:
      
        $ ./perf stat true
        Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
        may provide additional information.
      
        Fatal: Not all events could be opened.
      
      Just ignore EINVAL and ENOSYS, we'll print the results as not counted:
      
       Performance counter stats for 'true':
      
                0.239483 task-clock               #    0.493 CPUs utilized
                       0 context-switches         #    0.000 M/sec
                       0 CPU-migrations           #    0.000 M/sec
                      86 page-faults              #    0.359 M/sec
                 704,766 cycles                   #    2.943 GHz
           <not counted> stalled-cycles
                 381,961 instructions             #    0.54  insns per cycle
                  69,626 branches                 #  290.735 M/sec
                   4,594 branch-misses            #    6.60% of all branches
      
              0.000485883  seconds time elapsed
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ede70290