1. 29 4月, 2011 4 次提交
    • I
      perf stat: Analyze front-end and back-end stall counts · d3d1e86d
      Ingo Molnar 提交于
      Sample output:
      
       Performance counter stats for './loop_1b':
      
              873.691065 task-clock               #    1.000 CPUs utilized
                       1 context-switches         #    0.000 M/sec
                       1 CPU-migrations           #    0.000 M/sec
                      96 page-faults              #    0.000 M/sec
           2,012,637,222 cycles                   #    2.304 GHz                      (66.58%)
           1,001,397,911 stalled-cycles-frontend  #   49.76% frontend cycles idle     (66.58%)
               7,523,398 stalled-cycles-backend   #    0.37%  backend cycles idle     (66.76%)
           2,004,551,046 instructions             #    1.00  insns per cycle
                                                  #    0.50  stalled cycles per insn  (66.80%)
           1,001,304,992 branches                 # 1146.063 M/sec                    (66.76%)
                  39,453 branch-misses            #    0.00% of all branches          (66.64%)
      
              0.874046121  seconds time elapsed
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n003io7hjpn1dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      d3d1e86d
    • I
      perf tools: Add front-end and back-end stalled cycles support · 129c04cb
      Ingo Molnar 提交于
      Update perf tooling to deal with front-end and back-end stalled cycles events.
      
      Add both the default 'perf stat' output.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n002io7hjpn1dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      129c04cb
    • I
      perf, x86: Add new stalled cycles events for Intel and AMD CPUs · 91fc4cc0
      Ingo Molnar 提交于
      Extend the Intel and AMD event definitions with generic front-end and
      back-end stall events.
      
      ( These are only approximations - suggestions are welcome for better events. )
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n001io7hjpn1dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      91fc4cc0
    • I
      perf events: Add generic front-end and back-end stalled cycle event definitions · 8f622422
      Ingo Molnar 提交于
      Add two generic hardware events: front-end and back-end stalled cycles.
      
      These events measure conditions when the CPU is executing code but its
      capabilities are not fully utilized. Understanding such situations and
      analyzing them is an important sub-task of code optimization workflows.
      
      Both events limit performance: most front end stalls tend to be caused
      by branch misprediction or instruction fetch cachemisses, backend
      stalls can be caused by various resource shortages or inefficient
      instruction scheduling.
      
      Front-end stalls are the more important ones: code cannot run fast
      if the instruction stream is not being kept up.
      
      An over-utilized back-end can cause front-end stalls and thus
      has to be kept an eye on as well.
      
      The exact composition is very program logic and instruction mix
      dependent.
      
      We use the terms 'stall', 'front-end' and 'back-end' loosely and
      try to use the best available events from specific CPUs that
      approximate these concepts.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n000io7hjpn1dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      8f622422
  2. 28 4月, 2011 3 次提交
    • I
      perf stat: Fix compatibility behavior · ede70290
      Ingo Molnar 提交于
      Instead of failing on an unknown event, when new perf stat is run on
      older kernels:
      
        $ ./perf stat true
        Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
        may provide additional information.
      
        Fatal: Not all events could be opened.
      
      Just ignore EINVAL and ENOSYS, we'll print the results as not counted:
      
       Performance counter stats for 'true':
      
                0.239483 task-clock               #    0.493 CPUs utilized
                       0 context-switches         #    0.000 M/sec
                       0 CPU-migrations           #    0.000 M/sec
                      86 page-faults              #    0.359 M/sec
                 704,766 cycles                   #    2.943 GHz
           <not counted> stalled-cycles
                 381,961 instructions             #    0.54  insns per cycle
                  69,626 branches                 #  290.735 M/sec
                   4,594 branch-misses            #    6.60% of all branches
      
              0.000485883  seconds time elapsed
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ede70290
    • I
      perf stat: Add --sync/-S option · f9cef0a9
      Ingo Molnar 提交于
      --sync will tell perf stat to run sync() before starting a command.
      
      This allows IO-heavy tests to be used with --repeat, without one
      iteration impacting the other.
      
      Elapsed time will stabilize for example:
      
        before:        3.971525714  seconds time elapsed  ( +-  8.56% )
        after:         3.211098537  seconds time elapsed  ( +-  1.52% )
      
      So measurements will be more accurate.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f9cef0a9
    • I
      perf event, x86: Use better stalled cycles metric · 8a850cad
      Ingo Molnar 提交于
      Use the UOPS_EXECUTED.*,c=1,i=1 event on Intel CPUs - it is a rather
      good indicator of CPU execution stalls, more sensitive and more inclusive
      than the 0xa2 resource stalls event (which does not count nearly as many
      stall types).
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn2dsrm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      8a850cad
  3. 27 4月, 2011 18 次提交
  4. 26 4月, 2011 11 次提交
  5. 24 4月, 2011 4 次提交