1. 20 11月, 2010 1 次提交
    • S
      perf stat: Add no-aggregation mode to -a · f5b4a9c3
      Stephane Eranian 提交于
      This patch adds a new -A option to perf stat. If specified then perf stat does
      not aggregate counts across all monitored CPUs in system-wide mode, i.e., when
      using -a. This option is not supported in per-thread mode.
      
      Being able to get a per-cpu breakdown is useful to detect imbalances between
      CPUs when running a uniform workload than spans all monitored CPUs.
      
      The second version corrects the missing cpumap[] support, so that it works when
      the -C option is used.
      
      The third version fixes a missing cpumap[] in print_counter() and removes a
      stray patch in builtin-trace.c.
      
      Examples on a 4-way system:
      
      # perf stat -a   -e cycles,instructions -- sleep 1
       Performance counter stats for 'sleep 1':
               9592808135  cycles
               3490380006  instructions             #      0.364 IPC
              1.001584632  seconds time elapsed
      
      # perf stat -a -A -e cycles,instructions -- sleep 1
       Performance counter stats for 'sleep 1':
      CPU0            2398163767  cycles
      CPU1            2398180817  cycles
      CPU2            2398217115  cycles
      CPU3            2398247483  cycles
      CPU0             872282046  instructions             #      0.364 IPC
      CPU1             873481776  instructions             #      0.364 IPC
      CPU2             872638127  instructions             #      0.364 IPC
      CPU3             872437789  instructions             #      0.364 IPC
              1.001556052  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f5b4a9c3
  2. 05 6月, 2010 1 次提交
    • S
      perf tools: Add the ability to specify list of cpus to monitor · c45c6ea2
      Stephane Eranian 提交于
      This patch adds a -C option to stat, record, top to designate a list of CPUs to
      monitor. CPUs can be specified as a comma-separated list or ranges, no space
      allowed.
      
      Examples:
      $ perf record -a -C0-1,4-7 sleep 1
      $ perf top -C0-4
      $ perf stat -a -C1,2,3,4 sleep 1
      
      With perf record in per-thread mode with inherit mode on, samples are collected
      only when the thread runs on the designated CPUs.
      
      The -C option does not turn on system-wide mode automatically.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bff9496.d345d80a.41fe.7b00@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c45c6ea2
  3. 19 5月, 2010 1 次提交
    • S
      perf stat: add perf stat -B to pretty print large numbers · 5af52b51
      Stephane Eranian 提交于
      It is hard to read very large numbers so provide an option to perf stat
      to separate thousands using a separator. The patch leverages the locale
      support of stdio. You need to set your LC_NUMERIC appropriately, for
      instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
      feature. This way existing scripts parsing the output do not need to be
      changed. Here is an example.
      
      $ perf stat noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.347031  task-clock-msecs         #      0.998 CPUs
                       61  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      118  page-faults              #      0.000 M/sec
            4,138,410,900  cycles                   #   2070.917 M/sec  (scaled from 70.01%)
            2,062,650,268  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,057,653,466  branches                 #   1029.678 M/sec  (scaled from 70.01%)
                   40,267  branch-misses            #      0.002 %      (scaled from 30.04%)
            2,055,961,348  cache-references         #   1028.831 M/sec  (scaled from 30.03%)
                   53,725  cache-misses             #      0.027 M/sec  (scaled from 30.02%)
      
              2.001393933  seconds time elapsed
      
      $ perf stat -B  noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.297883  task-clock-msecs         #      0.998 CPUs
                       59  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      119  page-faults              #      0.000 M/sec
            4,131,380,160  cycles                   #   2067.450 M/sec  (scaled from 70.01%)
            2,059,096,507  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,054,681,303  branches                 #   1028.216 M/sec  (scaled from 70.01%)
                   25,650  branch-misses            #      0.001 %      (scaled from 30.05%)
            2,056,283,014  cache-references         #   1029.017 M/sec  (scaled from 30.03%)
                   47,097  cache-misses             #      0.024 M/sec  (scaled from 30.02%)
      
              2.001391016  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af52b51
  4. 14 5月, 2010 1 次提交
    • S
      perf tools: change event inheritance logic in stat and record · 2e6cdf99
      Stephane Eranian 提交于
      By default, event inheritance across fork and pthread_create was on but the -i
      option of stat and record, which enabled inheritance, led to believe it was off
      by default.
      
      This patch fixes this logic by inverting the meaning of the -i option.  By
      default inheritance is on whether you attach to a process (-p), a thread (-t)
      or start a process. If you pass -i, then you turn off inheritance. Turning off
      inheritance if you don't need it, helps limit perf resource usage as well.
      
      The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not
      start the counters.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4bea9d2f.d60ce30a.0b5b.08e1@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e6cdf99
  5. 14 4月, 2010 1 次提交
    • I
      perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() · c0555642
      Ian Munsie 提交于
      Parsing an option from the command line with OPT_BOOLEAN on a
      bool data type would not work on a big-endian machine due to the
      manner in which the boolean was being cast into an int and
      incremented. For example, running 'perf probe --list' on a
      PowerPC machine would fail to properly set the list_events bool
      and would therefore print out the usage information and
      terminate.
      
      This patch makes OPT_BOOLEAN work as expected with a bool
      datatype. For cases where the original OPT_BOOLEAN was
      intentionally being used to increment an int each time it was
      passed in on the command line, this patch introduces OPT_INCR
      with the old behaviour of OPT_BOOLEAN (the verbose variable is
      currently the only such example of this).
      
      I have reviewed every use of OPT_BOOLEAN to verify that a true
      C99 bool was passed. Where integers were used, I verified that
      they were only being used for boolean logic and changed them to
      bools to ensure that they would not be mistakenly used as ints.
      The major exception was the verbose variable which now uses
      OPT_INCR instead of OPT_BOOLEAN.
      Signed-off-by: NIan Munsie <imunsie@au.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport
      Cc: Git development list <git@vger.kernel.org>
      Cc: Ian Munsie <imunsie@au1.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Valdis.Kletnieks@vt.edu
      Cc: WANG Cong <amwang@redhat.com>
      Cc: Thiago Farina <tfransosi@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c0555642
  6. 23 3月, 2010 1 次提交
    • A
      perf stat: Better report failure to collect system wide stats · 084ab9f8
      Arnaldo Carvalho de Melo 提交于
      Before:
      
      [acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
      
       Performance counter stats for 'sleep 1s':
      
        <not counted>  task-clock-msecs
        <not counted>  context-switches
        <not counted>  CPU-migrations
        <not counted>  page-faults
        <not counted>  cycles
        <not counted>  instructions
        <not counted>  branches
        <not counted>  branch-misses
        <not counted>  cache-references
        <not counted>  cache-misses
      
          1.016998463  seconds time elapsed
      
      [acme@doppio linux-2.6-tip]$
      
      Now:
      
      [acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
      No permission to collect system-wide stats.
      Consider tweaking /proc/sys/kernel/perf_event_paranoid.
      [acme@doppio linux-2.6-tip]$
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1269274229-20442-4-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      084ab9f8
  7. 18 3月, 2010 2 次提交
    • Z
      perf events: Change perf parameter --pid to process-wide collection instead of thread-wide · d6d901c2
      Zhang, Yanmin 提交于
      Parameter --pid (or -p) of perf currently means a thread-wide
      collection. For exmaple, if a process whose id is 8888 has 10
      threads, 'perf top -p 8888' just collects the main thread
      statistics. That's misleading. Users are used to attach a whole
      process when debugging a process by gdb. To follow normal usage
      style, the patch change --pid to process-wide collection and add
      --tid (-t) to mean a thread-wide collection.
      
      Usage example is:
      
       # perf top -p 8888
       # perf record -p 8888 -f sleep 10
       # perf stat -p 8888 -f sleep 10
      
      Above commands collect the statistics of all threads of process
      8888.
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sheng Yang <sheng@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: zhiteng.huang@intel.com
      Cc: Zachary Amsden <zamsden@redhat.com>
      LKML-Reference: <1268922965-14774-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6d901c2
    • Z
      perf stat: Enable counters when collecting process-wide or system-wide data · 6be2850e
      Zhang, Yanmin 提交于
      Command 'perf stat' doesn't enable counters when collecting an
      existing (by -p) process or system-wide statistics. Fix the
      issue.
      
      Change the condition of fork/exec subcommand. If there is a
      subcommand parameter, perf always forks/execs it. The usage
      example is:
      
       # perf stat -a sleep 10
      
      So this command could collect statistics for 10 seconds
      precisely. User still could stop it by CTRL+C. Without the new
      capability, user could only use CTRL+C to stop it without
      precise time clock.
      
      Another issue is 'perf stat -a' consumes 100% time of a full
      single logical cpu. It has a bad impact on running workload.
      
      Fix it by adding a sleep(1) in the while(!done) loop in function
      run_perf_stat.
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sheng Yang <sheng@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Zachary Amsden <zamsden@redhat.com>
      Cc: <zhiteng.huang@intel.com>
      LKML-Reference: <1268922965-14774-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6be2850e
  8. 11 3月, 2010 1 次提交
    • P
      perf tools: Fix sparse CPU numbering related bugs · a12b51c4
      Paul Mackerras 提交于
      At present, the perf subcommands that do system-wide monitoring
      (perf stat, perf record and perf top) don't work properly unless
      the online cpus are numbered 0, 1, ..., N-1.  These tools ask
      for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
      and then try to create events for cpus 0, 1, ..., N-1.
      
      This creates problems for systems where the online cpus are
      numbered sparsely.  For example, a POWER6 system in
      single-threaded mode (i.e. only running 1 hardware thread per
      core) will have only even-numbered cpus online.
      
      This fixes the problem by reading the /sys/devices/system/cpu/online
      file to find out which cpus are online.  The code that does that is in
      tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
      function that sets up a cpumap[] array and returns the number of
      online cpus.  If /sys/devices/system/cpu/online can't be read or
      can't be parsed successfully, it falls back to using sysconf to
      ask how many cpus are online and sets up an identity map in cpumap[].
      
      The perf record, perf stat and perf top code then calls
      read_cpu_map() in the system-wide monitoring case (instead of
      sysconf) and uses cpumap[] to get the cpu numbers to pass to
      perf_event_open.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100310093609.GA3959@brick.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a12b51c4
  9. 13 1月, 2010 1 次提交
  10. 15 11月, 2009 1 次提交
    • L
      perf stat: Do not print ratio when task-clock event is not counted · 7255fe2a
      Lucas De Marchi 提交于
      The ratio between the number of events and the time elapsed makes
      sense only if task-clock event is counted. Otherwise it will be
      simply a (confusing)
      
      	#      0.000 M/sec
      
      This patch outputs the ratio only if task-clock event is counted.
      Some test examples of before and after:
      
      Before:
      
       [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1
      
      	 Performance counter stats for 'sleep 1':
      
      		1367818  branch-misses            #      0.000 M/sec
      
      	    1.001494325  seconds time elapsed
      
      After (without task-clock):
      
       [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1
      
      	 Performance counter stats for 'sleep 1':
      
      		1135044  branch-misses
      
      	    1.001370775  seconds time elapsed
      
      After (with task-clock):
      
       [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1
      
      	 Performance counter stats for 'sleep 1':
      
      		1070111  branch-misses            #      0.534 M/sec
      	    2002.730893  task-clock-msecs         #      1.999 CPUs
      
      	    1.001640292  seconds time elapsed
      Signed-off-by: NLucas De Marchi <lucas.de.marchi@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <20091115140507.GB21561@skywalker.lan>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7255fe2a
  11. 19 10月, 2009 4 次提交
    • I
      perf stat: Count branches first · dd86e72a
      Ingo Molnar 提交于
      Count branches first, cache-misses second. The reason is that
      on x86 branches are not counted by all counters on all CPUs.
      
      Before:
      
       Performance counter stats for 'ls':
      
             0.756653  task-clock-msecs         #      0.802 CPUs
                    0  context-switches         #      0.000 M/sec
                    0  CPU-migrations           #      0.000 M/sec
                  250  page-faults              #      0.330 M/sec
              2375725  cycles                   #   3139.781 M/sec
              1628129  instructions             #      0.685 IPC
                19643  cache-references         #     25.960 M/sec
                 4608  cache-misses             #      6.090 M/sec
               342532  branches                 #    452.694 M/sec
        <not counted>  branch-misses
      
          0.000943356  seconds time elapsed
      
      After:
      
       Performance counter stats for 'ls':
      
             1.056734  task-clock-msecs         #      0.859 CPUs
                    0  context-switches         #      0.000 M/sec
                    0  CPU-migrations           #      0.000 M/sec
                  259  page-faults              #      0.245 M/sec
              3345932  cycles                   #   3166.295 M/sec
              3074090  instructions             #      0.919 IPC
               616928  branches                 #    583.806 M/sec
                39279  branch-misses            #      6.367 %
                21312  cache-references         #     20.168 M/sec
                 3661  cache-misses             #      3.464 M/sec
      
          0.001230551  seconds time elapsed
      
      (also prettify the printout of branch misses, in case it's
       getting scaled.)
      
      Cc: Tim Blechmann <tim@klingt.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4ADC3975.8050109@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ---
       tools/perf/builtin-stat.c |    2 ++
       1 files changed, 2 insertions(+), 0 deletions(-)
      
      diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
      index c373683..95a55ea 100644
      --- a/tools/perf/builtin-stat.c
      +++ b/tools/perf/builtin-stat.c
      @@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS	},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES	},
      
       };
      ---
       tools/perf/builtin-stat.c |   20 ++++++++++----------
       1 files changed, 10 insertions(+), 10 deletions(-)
      
      diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
      index 95a55ea..90e0a26 100644
      --- a/tools/perf/builtin-stat.c
      +++ b/tools/perf/builtin-stat.c
      @@ -50,17 +50,17 @@
      
       static struct perf_event_attr default_attrs[] = {
      
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK	},
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES},
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS	},
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS	},
      -
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES	},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS	},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES	},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES	},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES	},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS		},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS		},
      +
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES		},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS		},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES		},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES		},
      
       };
      dd86e72a
    • I
      perf stat: Re-align the default_attrs[] array · 56aab464
      Ingo Molnar 提交于
      Clean up the array definition to be vertically aligned.
      
      No functional effects.
      
      Cc: Tim Blechmann <tim@klingt.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4ADC3975.8050109@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ---
       tools/perf/builtin-stat.c |    2 ++
       1 files changed, 2 insertions(+), 0 deletions(-)
      
      diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
      index c373683..95a55ea 100644
      --- a/tools/perf/builtin-stat.c
      +++ b/tools/perf/builtin-stat.c
      @@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS	},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES	},
      
       };
      56aab464
    • T
      perf stat: Add branch performance events to default output · 12133aff
      Tim Blechmann 提交于
      Adds performance event information about branches
      and branch misses to the default output of perf stat.
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4ADC3975.8050109@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      12133aff
    • A
      perf stat: Add branch performance metric · 11018201
      Anton Blanchard 提交于
      When we count both branches and branch-misses it is useful to
      print out the percentage of branch-misses:
      
       # perf stat -e branches -e branch-misses /bin/true
      
       Performance counter stats for '/bin/true':
      
               401684  branches                 #      0.000 M/sec
                23301  branch-misses            #      5.801 %
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: paulus@samba.org
      Cc: a.p.zijlstra@chello.nl
      LKML-Reference: <20091018112923.GQ4808@kryten>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      11018201
  12. 05 10月, 2009 1 次提交
  13. 22 9月, 2009 1 次提交
    • I
      perf stat: Fix zero total printouts · c7f7fea3
      Ingo Molnar 提交于
      Before:
      
                 0  sched:sched_switch #        nan M/sec
      
      After:
      
                 0  sched:sched_switch #      0.000 M/sec
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7f7fea3
  14. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  15. 05 9月, 2009 1 次提交
  16. 04 9月, 2009 4 次提交
  17. 17 8月, 2009 1 次提交
    • F
      perf tools: Librarize trace_event() helper · 8f28827a
      Frederic Weisbecker 提交于
      Librarize trace_event() helper so that perf trace can use it
      too. Also clean up the debug.h includes a bit.
      
      It's not good to have it included in perf.h because it doesn't
      make it flexible against other headers it may need (headers
      that can also depend on perf.h and then create a recursive
      header dependency).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1250453149-664-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f28827a
  18. 12 8月, 2009 1 次提交
    • F
      perf tools: Factorize high level dso helpers · cd84c2ac
      Frederic Weisbecker 提交于
      Factorize multiple definitions of high level dso helpers into the
      symbol source file.
      
      The side effect is a general export of the verbose and eprintf
      debugging helpers into a new file dedicated to debugging purposes.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      cd84c2ac
  19. 09 8月, 2009 1 次提交
  20. 23 7月, 2009 1 次提交
  21. 02 7月, 2009 1 次提交
    • F
      perf stat: Handle pipe read failures in perf stat · a92bef0f
      Frederic Weisbecker 提交于
      Building builtin-stat.c reports the following errors:
      
      cc1: warnings being treated as errors
      builtin-stat.c: In function ‘run_perf_stat’:
      builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
      builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
      make: *** [builtin-stat.o] Erreur 1
      
      This patch handles the possible pipe read failures.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a92bef0f
  22. 01 7月, 2009 2 次提交
  23. 30 6月, 2009 3 次提交
    • P
      perf_counter: Provide a way to enable counters on exec · 57e7986e
      Paul Mackerras 提交于
      This provides a way to mark a counter to be enabled on the next
      exec. This is useful for measuring the total activity of a
      program without including overhead from the process that
      launches it.
      
      This also changes the perf stat command to use this new
      facility.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19017.43927.838745.689203@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      57e7986e
    • P
      perf_counter tools: Reduce perf stat measurement overhead/skew · 051ae7f7
      Paul Mackerras 提交于
      Vince Weaver reported a 'perf stat' measurement overhead in the
      count of retired instructions, which can amount to a +6000
      instructions inflated count in the reported count.
      
      At present, perf stat creates its counters on the perf process.  Thus
      the counters count the fork and various other activity in both the
      parent and child, such as the resolver overhead for resolving PLT
      entries for any libc functions that haven't been called before, such
      as execvp.
      
      This reduces the overhead by creating the counters on the child process
      after the fork, using a couple of pipes to synchronize so that the
      child process waits until the parent has created the counters before
      doing the exec.  To eliminate the PLT resolution overhead on calling
      execvp, this does a dummy execvp first which will always fail.
      
      With this, the overhead of executing a program goes down from over
      4800 instructions to about 90 instructions on powerpc (32-bit).
      This was measured with a statically-linked program written in
      assembler which only does the 3 instructions needed to call _exit(0).
      
      Before:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                 4858  instructions
      
          0.001274523  seconds time elapsed
      
      After:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                   92  instructions
      
          0.000468153  seconds time elapsed
      Reported-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      051ae7f7
    • I
      perf stat: Use percentages for scaling output · 210ad39f
      Ingo Molnar 提交于
      Peter expressed a strong preference for percentage based
      display of scaled values - so revert to that from the
      recently introduced multiplication-factor unit.
      Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      210ad39f
  24. 28 6月, 2009 2 次提交
    • J
      perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run · c3043569
      Jaswinder Singh Rajput 提交于
      Set attrs and nr_counters if no event is selected and !null_run.
      
      Setting of attrs should depend on number of counters,
      so we need to memcpy only for sizeof(default_attrs)
      
      Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
      hardcoded value.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246126749.32198.16.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c3043569
    • J
      perf stat: Improve output · 6e750a8f
      Jaswinder Singh Rajput 提交于
      Increase size for event name to handle bigger names like
      'L1-d$-prefetch-misses'
      
      Changed scaled counters from percentage to a multiplicative
      factor because the latter is more expressive.
      
      Also aligned the scaling factor, otherwise sometimes it looks
      like:
      
                  384  iTLB-load-misses           (4.74x scaled)
               452029  branch-loads               (8.00x scaled)
                 5892  branch-load-misses         (20.39x scaled)
               972315  iTLB-loads                 (3.24x scaled)
      
      Before:
               150708  L1-d$-stores          (scaled from 23.57%)
               428804  L1-d$-prefetches      (scaled from 23.47%)
               314446  L1-d$-prefetch-misses  (scaled from 23.42%)
            252626137  L1-i$-loads           (scaled from 23.24%)
              5297550  dTLB-load-misses      (scaled from 23.96%)
            106992392  branch-loads          (scaled from 23.67%)
              5239561  branch-load-misses    (scaled from 23.43%)
      
      After:
              1731713  L1-d$-loads               (  14.25x scaled)
                44241  L1-d$-prefetches          (   3.88x scaled)
                21076  L1-d$-prefetch-misses     (   3.40x scaled)
              5789421  L1-i$-loads               (   3.78x scaled)
                29645  dTLB-load-misses          (   2.95x scaled)
               461474  branch-loads              (   6.52x scaled)
                 7493  branch-load-misses        (  26.57x scaled)
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e750a8f
  25. 27 6月, 2009 2 次提交
    • I
      perf stat: Fix multi-run stats · 566747e6
      Ingo Molnar 提交于
      In multi-run (-r/--repeat) printouts, print out the noise of
      the wall-clock average as well.
      
      Also, fix a bug in printing out scaled counters: if it was not
      scaled then we should not update the average with -1.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      566747e6
    • I
      perf stat: Add -n/--null option to run without counters · 0cfb7a13
      Ingo Molnar 提交于
      Allow a no-counters run. This can be useful to measure just
      elapsed wall-clock time - or to assess the raw overhead of perf
      stat itself, without running any counters.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cfb7a13
  26. 24 6月, 2009 2 次提交
  27. 20 6月, 2009 1 次提交
    • P
      perf_counter tools: Define and use our own u64, s64 etc. definitions · 9cffa8d5
      Paul Mackerras 提交于
      On 64-bit powerpc, __u64 is defined to be unsigned long rather than
      unsigned long long.  This causes compiler warnings every time we
      print a __u64 value with %Lx.
      
      Rather than changing __u64, we define our own u64 to be unsigned long
      long on all architectures, and similarly s64 as signed long long.
      For consistency we also define u32, s32, u16, s16, u8 and s8.  These
      definitions are put in a new header, types.h, because these definitions
      are needed in util/string.h and util/symbol.h.
      
      The main change here is the mechanical change of __[us]{64,32,16,8}
      to remove the "__".  The other changes are:
      
      * Create types.h
      * Include types.h in perf.h, util/string.h and util/symbol.h
      * Add types.h to the LIB_H definition in Makefile
      * Added (u64) casts in process_overflow_event() and print_sym_table()
        to kill two remaining warnings.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: benh@kernel.crashing.org
      LKML-Reference: <19003.33494.495844.956580@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9cffa8d5