1. 19 5月, 2010 1 次提交
    • S
      perf stat: add perf stat -B to pretty print large numbers · 5af52b51
      Stephane Eranian 提交于
      It is hard to read very large numbers so provide an option to perf stat
      to separate thousands using a separator. The patch leverages the locale
      support of stdio. You need to set your LC_NUMERIC appropriately, for
      instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
      feature. This way existing scripts parsing the output do not need to be
      changed. Here is an example.
      
      $ perf stat noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.347031  task-clock-msecs         #      0.998 CPUs
                       61  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      118  page-faults              #      0.000 M/sec
            4,138,410,900  cycles                   #   2070.917 M/sec  (scaled from 70.01%)
            2,062,650,268  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,057,653,466  branches                 #   1029.678 M/sec  (scaled from 70.01%)
                   40,267  branch-misses            #      0.002 %      (scaled from 30.04%)
            2,055,961,348  cache-references         #   1028.831 M/sec  (scaled from 30.03%)
                   53,725  cache-misses             #      0.027 M/sec  (scaled from 30.02%)
      
              2.001393933  seconds time elapsed
      
      $ perf stat -B  noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.297883  task-clock-msecs         #      0.998 CPUs
                       59  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      119  page-faults              #      0.000 M/sec
            4,131,380,160  cycles                   #   2067.450 M/sec  (scaled from 70.01%)
            2,059,096,507  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,054,681,303  branches                 #   1028.216 M/sec  (scaled from 70.01%)
                   25,650  branch-misses            #      0.001 %      (scaled from 30.05%)
            2,056,283,014  cache-references         #   1029.017 M/sec  (scaled from 30.03%)
                   47,097  cache-misses             #      0.024 M/sec  (scaled from 30.02%)
      
              2.001391016  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af52b51
  2. 14 5月, 2010 1 次提交
    • S
      perf tools: change event inheritance logic in stat and record · 2e6cdf99
      Stephane Eranian 提交于
      By default, event inheritance across fork and pthread_create was on but the -i
      option of stat and record, which enabled inheritance, led to believe it was off
      by default.
      
      This patch fixes this logic by inverting the meaning of the -i option.  By
      default inheritance is on whether you attach to a process (-p), a thread (-t)
      or start a process. If you pass -i, then you turn off inheritance. Turning off
      inheritance if you don't need it, helps limit perf resource usage as well.
      
      The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not
      start the counters.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4bea9d2f.d60ce30a.0b5b.08e1@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e6cdf99
  3. 14 4月, 2010 1 次提交
    • I
      perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() · c0555642
      Ian Munsie 提交于
      Parsing an option from the command line with OPT_BOOLEAN on a
      bool data type would not work on a big-endian machine due to the
      manner in which the boolean was being cast into an int and
      incremented. For example, running 'perf probe --list' on a
      PowerPC machine would fail to properly set the list_events bool
      and would therefore print out the usage information and
      terminate.
      
      This patch makes OPT_BOOLEAN work as expected with a bool
      datatype. For cases where the original OPT_BOOLEAN was
      intentionally being used to increment an int each time it was
      passed in on the command line, this patch introduces OPT_INCR
      with the old behaviour of OPT_BOOLEAN (the verbose variable is
      currently the only such example of this).
      
      I have reviewed every use of OPT_BOOLEAN to verify that a true
      C99 bool was passed. Where integers were used, I verified that
      they were only being used for boolean logic and changed them to
      bools to ensure that they would not be mistakenly used as ints.
      The major exception was the verbose variable which now uses
      OPT_INCR instead of OPT_BOOLEAN.
      Signed-off-by: NIan Munsie <imunsie@au.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport
      Cc: Git development list <git@vger.kernel.org>
      Cc: Ian Munsie <imunsie@au1.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Valdis.Kletnieks@vt.edu
      Cc: WANG Cong <amwang@redhat.com>
      Cc: Thiago Farina <tfransosi@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c0555642
  4. 23 3月, 2010 1 次提交
    • A
      perf stat: Better report failure to collect system wide stats · 084ab9f8
      Arnaldo Carvalho de Melo 提交于
      Before:
      
      [acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
      
       Performance counter stats for 'sleep 1s':
      
        <not counted>  task-clock-msecs
        <not counted>  context-switches
        <not counted>  CPU-migrations
        <not counted>  page-faults
        <not counted>  cycles
        <not counted>  instructions
        <not counted>  branches
        <not counted>  branch-misses
        <not counted>  cache-references
        <not counted>  cache-misses
      
          1.016998463  seconds time elapsed
      
      [acme@doppio linux-2.6-tip]$
      
      Now:
      
      [acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
      No permission to collect system-wide stats.
      Consider tweaking /proc/sys/kernel/perf_event_paranoid.
      [acme@doppio linux-2.6-tip]$
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1269274229-20442-4-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      084ab9f8
  5. 18 3月, 2010 2 次提交
    • Z
      perf events: Change perf parameter --pid to process-wide collection instead of thread-wide · d6d901c2
      Zhang, Yanmin 提交于
      Parameter --pid (or -p) of perf currently means a thread-wide
      collection. For exmaple, if a process whose id is 8888 has 10
      threads, 'perf top -p 8888' just collects the main thread
      statistics. That's misleading. Users are used to attach a whole
      process when debugging a process by gdb. To follow normal usage
      style, the patch change --pid to process-wide collection and add
      --tid (-t) to mean a thread-wide collection.
      
      Usage example is:
      
       # perf top -p 8888
       # perf record -p 8888 -f sleep 10
       # perf stat -p 8888 -f sleep 10
      
      Above commands collect the statistics of all threads of process
      8888.
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sheng Yang <sheng@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: zhiteng.huang@intel.com
      Cc: Zachary Amsden <zamsden@redhat.com>
      LKML-Reference: <1268922965-14774-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6d901c2
    • Z
      perf stat: Enable counters when collecting process-wide or system-wide data · 6be2850e
      Zhang, Yanmin 提交于
      Command 'perf stat' doesn't enable counters when collecting an
      existing (by -p) process or system-wide statistics. Fix the
      issue.
      
      Change the condition of fork/exec subcommand. If there is a
      subcommand parameter, perf always forks/execs it. The usage
      example is:
      
       # perf stat -a sleep 10
      
      So this command could collect statistics for 10 seconds
      precisely. User still could stop it by CTRL+C. Without the new
      capability, user could only use CTRL+C to stop it without
      precise time clock.
      
      Another issue is 'perf stat -a' consumes 100% time of a full
      single logical cpu. It has a bad impact on running workload.
      
      Fix it by adding a sleep(1) in the while(!done) loop in function
      run_perf_stat.
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sheng Yang <sheng@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Zachary Amsden <zamsden@redhat.com>
      Cc: <zhiteng.huang@intel.com>
      LKML-Reference: <1268922965-14774-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6be2850e
  6. 11 3月, 2010 1 次提交
    • P
      perf tools: Fix sparse CPU numbering related bugs · a12b51c4
      Paul Mackerras 提交于
      At present, the perf subcommands that do system-wide monitoring
      (perf stat, perf record and perf top) don't work properly unless
      the online cpus are numbered 0, 1, ..., N-1.  These tools ask
      for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
      and then try to create events for cpus 0, 1, ..., N-1.
      
      This creates problems for systems where the online cpus are
      numbered sparsely.  For example, a POWER6 system in
      single-threaded mode (i.e. only running 1 hardware thread per
      core) will have only even-numbered cpus online.
      
      This fixes the problem by reading the /sys/devices/system/cpu/online
      file to find out which cpus are online.  The code that does that is in
      tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
      function that sets up a cpumap[] array and returns the number of
      online cpus.  If /sys/devices/system/cpu/online can't be read or
      can't be parsed successfully, it falls back to using sysconf to
      ask how many cpus are online and sets up an identity map in cpumap[].
      
      The perf record, perf stat and perf top code then calls
      read_cpu_map() in the system-wide monitoring case (instead of
      sysconf) and uses cpumap[] to get the cpu numbers to pass to
      perf_event_open.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100310093609.GA3959@brick.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a12b51c4
  7. 13 1月, 2010 1 次提交
  8. 15 11月, 2009 1 次提交
    • L
      perf stat: Do not print ratio when task-clock event is not counted · 7255fe2a
      Lucas De Marchi 提交于
      The ratio between the number of events and the time elapsed makes
      sense only if task-clock event is counted. Otherwise it will be
      simply a (confusing)
      
      	#      0.000 M/sec
      
      This patch outputs the ratio only if task-clock event is counted.
      Some test examples of before and after:
      
      Before:
      
       [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1
      
      	 Performance counter stats for 'sleep 1':
      
      		1367818  branch-misses            #      0.000 M/sec
      
      	    1.001494325  seconds time elapsed
      
      After (without task-clock):
      
       [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1
      
      	 Performance counter stats for 'sleep 1':
      
      		1135044  branch-misses
      
      	    1.001370775  seconds time elapsed
      
      After (with task-clock):
      
       [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1
      
      	 Performance counter stats for 'sleep 1':
      
      		1070111  branch-misses            #      0.534 M/sec
      	    2002.730893  task-clock-msecs         #      1.999 CPUs
      
      	    1.001640292  seconds time elapsed
      Signed-off-by: NLucas De Marchi <lucas.de.marchi@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <20091115140507.GB21561@skywalker.lan>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7255fe2a
  9. 19 10月, 2009 4 次提交
    • I
      perf stat: Count branches first · dd86e72a
      Ingo Molnar 提交于
      Count branches first, cache-misses second. The reason is that
      on x86 branches are not counted by all counters on all CPUs.
      
      Before:
      
       Performance counter stats for 'ls':
      
             0.756653  task-clock-msecs         #      0.802 CPUs
                    0  context-switches         #      0.000 M/sec
                    0  CPU-migrations           #      0.000 M/sec
                  250  page-faults              #      0.330 M/sec
              2375725  cycles                   #   3139.781 M/sec
              1628129  instructions             #      0.685 IPC
                19643  cache-references         #     25.960 M/sec
                 4608  cache-misses             #      6.090 M/sec
               342532  branches                 #    452.694 M/sec
        <not counted>  branch-misses
      
          0.000943356  seconds time elapsed
      
      After:
      
       Performance counter stats for 'ls':
      
             1.056734  task-clock-msecs         #      0.859 CPUs
                    0  context-switches         #      0.000 M/sec
                    0  CPU-migrations           #      0.000 M/sec
                  259  page-faults              #      0.245 M/sec
              3345932  cycles                   #   3166.295 M/sec
              3074090  instructions             #      0.919 IPC
               616928  branches                 #    583.806 M/sec
                39279  branch-misses            #      6.367 %
                21312  cache-references         #     20.168 M/sec
                 3661  cache-misses             #      3.464 M/sec
      
          0.001230551  seconds time elapsed
      
      (also prettify the printout of branch misses, in case it's
       getting scaled.)
      
      Cc: Tim Blechmann <tim@klingt.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4ADC3975.8050109@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ---
       tools/perf/builtin-stat.c |    2 ++
       1 files changed, 2 insertions(+), 0 deletions(-)
      
      diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
      index c373683..95a55ea 100644
      --- a/tools/perf/builtin-stat.c
      +++ b/tools/perf/builtin-stat.c
      @@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS	},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES	},
      
       };
      ---
       tools/perf/builtin-stat.c |   20 ++++++++++----------
       1 files changed, 10 insertions(+), 10 deletions(-)
      
      diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
      index 95a55ea..90e0a26 100644
      --- a/tools/perf/builtin-stat.c
      +++ b/tools/perf/builtin-stat.c
      @@ -50,17 +50,17 @@
      
       static struct perf_event_attr default_attrs[] = {
      
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK	},
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES},
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS	},
      -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS	},
      -
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES	},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS	},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES	},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
      -  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES	},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES	},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS		},
      +  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS		},
      +
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES		},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS		},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES		},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES		},
      
       };
      dd86e72a
    • I
      perf stat: Re-align the default_attrs[] array · 56aab464
      Ingo Molnar 提交于
      Clean up the array definition to be vertically aligned.
      
      No functional effects.
      
      Cc: Tim Blechmann <tim@klingt.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4ADC3975.8050109@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ---
       tools/perf/builtin-stat.c |    2 ++
       1 files changed, 2 insertions(+), 0 deletions(-)
      
      diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
      index c373683..95a55ea 100644
      --- a/tools/perf/builtin-stat.c
      +++ b/tools/perf/builtin-stat.c
      @@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS	},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
         { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES	},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
      +  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES	},
      
       };
      56aab464
    • T
      perf stat: Add branch performance events to default output · 12133aff
      Tim Blechmann 提交于
      Adds performance event information about branches
      and branch misses to the default output of perf stat.
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4ADC3975.8050109@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      12133aff
    • A
      perf stat: Add branch performance metric · 11018201
      Anton Blanchard 提交于
      When we count both branches and branch-misses it is useful to
      print out the percentage of branch-misses:
      
       # perf stat -e branches -e branch-misses /bin/true
      
       Performance counter stats for '/bin/true':
      
               401684  branches                 #      0.000 M/sec
                23301  branch-misses            #      5.801 %
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: paulus@samba.org
      Cc: a.p.zijlstra@chello.nl
      LKML-Reference: <20091018112923.GQ4808@kryten>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      11018201
  10. 05 10月, 2009 1 次提交
  11. 22 9月, 2009 1 次提交
    • I
      perf stat: Fix zero total printouts · c7f7fea3
      Ingo Molnar 提交于
      Before:
      
                 0  sched:sched_switch #        nan M/sec
      
      After:
      
                 0  sched:sched_switch #      0.000 M/sec
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7f7fea3
  12. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  13. 05 9月, 2009 1 次提交
  14. 04 9月, 2009 4 次提交
  15. 17 8月, 2009 1 次提交
    • F
      perf tools: Librarize trace_event() helper · 8f28827a
      Frederic Weisbecker 提交于
      Librarize trace_event() helper so that perf trace can use it
      too. Also clean up the debug.h includes a bit.
      
      It's not good to have it included in perf.h because it doesn't
      make it flexible against other headers it may need (headers
      that can also depend on perf.h and then create a recursive
      header dependency).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1250453149-664-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f28827a
  16. 12 8月, 2009 1 次提交
    • F
      perf tools: Factorize high level dso helpers · cd84c2ac
      Frederic Weisbecker 提交于
      Factorize multiple definitions of high level dso helpers into the
      symbol source file.
      
      The side effect is a general export of the verbose and eprintf
      debugging helpers into a new file dedicated to debugging purposes.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      cd84c2ac
  17. 09 8月, 2009 1 次提交
  18. 23 7月, 2009 1 次提交
  19. 02 7月, 2009 1 次提交
    • F
      perf stat: Handle pipe read failures in perf stat · a92bef0f
      Frederic Weisbecker 提交于
      Building builtin-stat.c reports the following errors:
      
      cc1: warnings being treated as errors
      builtin-stat.c: In function ‘run_perf_stat’:
      builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
      builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
      make: *** [builtin-stat.o] Erreur 1
      
      This patch handles the possible pipe read failures.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a92bef0f
  20. 01 7月, 2009 2 次提交
  21. 30 6月, 2009 3 次提交
    • P
      perf_counter: Provide a way to enable counters on exec · 57e7986e
      Paul Mackerras 提交于
      This provides a way to mark a counter to be enabled on the next
      exec. This is useful for measuring the total activity of a
      program without including overhead from the process that
      launches it.
      
      This also changes the perf stat command to use this new
      facility.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19017.43927.838745.689203@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      57e7986e
    • P
      perf_counter tools: Reduce perf stat measurement overhead/skew · 051ae7f7
      Paul Mackerras 提交于
      Vince Weaver reported a 'perf stat' measurement overhead in the
      count of retired instructions, which can amount to a +6000
      instructions inflated count in the reported count.
      
      At present, perf stat creates its counters on the perf process.  Thus
      the counters count the fork and various other activity in both the
      parent and child, such as the resolver overhead for resolving PLT
      entries for any libc functions that haven't been called before, such
      as execvp.
      
      This reduces the overhead by creating the counters on the child process
      after the fork, using a couple of pipes to synchronize so that the
      child process waits until the parent has created the counters before
      doing the exec.  To eliminate the PLT resolution overhead on calling
      execvp, this does a dummy execvp first which will always fail.
      
      With this, the overhead of executing a program goes down from over
      4800 instructions to about 90 instructions on powerpc (32-bit).
      This was measured with a statically-linked program written in
      assembler which only does the 3 instructions needed to call _exit(0).
      
      Before:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                 4858  instructions
      
          0.001274523  seconds time elapsed
      
      After:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                   92  instructions
      
          0.000468153  seconds time elapsed
      Reported-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      051ae7f7
    • I
      perf stat: Use percentages for scaling output · 210ad39f
      Ingo Molnar 提交于
      Peter expressed a strong preference for percentage based
      display of scaled values - so revert to that from the
      recently introduced multiplication-factor unit.
      Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      210ad39f
  22. 28 6月, 2009 2 次提交
    • J
      perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run · c3043569
      Jaswinder Singh Rajput 提交于
      Set attrs and nr_counters if no event is selected and !null_run.
      
      Setting of attrs should depend on number of counters,
      so we need to memcpy only for sizeof(default_attrs)
      
      Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
      hardcoded value.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246126749.32198.16.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c3043569
    • J
      perf stat: Improve output · 6e750a8f
      Jaswinder Singh Rajput 提交于
      Increase size for event name to handle bigger names like
      'L1-d$-prefetch-misses'
      
      Changed scaled counters from percentage to a multiplicative
      factor because the latter is more expressive.
      
      Also aligned the scaling factor, otherwise sometimes it looks
      like:
      
                  384  iTLB-load-misses           (4.74x scaled)
               452029  branch-loads               (8.00x scaled)
                 5892  branch-load-misses         (20.39x scaled)
               972315  iTLB-loads                 (3.24x scaled)
      
      Before:
               150708  L1-d$-stores          (scaled from 23.57%)
               428804  L1-d$-prefetches      (scaled from 23.47%)
               314446  L1-d$-prefetch-misses  (scaled from 23.42%)
            252626137  L1-i$-loads           (scaled from 23.24%)
              5297550  dTLB-load-misses      (scaled from 23.96%)
            106992392  branch-loads          (scaled from 23.67%)
              5239561  branch-load-misses    (scaled from 23.43%)
      
      After:
              1731713  L1-d$-loads               (  14.25x scaled)
                44241  L1-d$-prefetches          (   3.88x scaled)
                21076  L1-d$-prefetch-misses     (   3.40x scaled)
              5789421  L1-i$-loads               (   3.78x scaled)
                29645  dTLB-load-misses          (   2.95x scaled)
               461474  branch-loads              (   6.52x scaled)
                 7493  branch-load-misses        (  26.57x scaled)
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e750a8f
  23. 27 6月, 2009 2 次提交
    • I
      perf stat: Fix multi-run stats · 566747e6
      Ingo Molnar 提交于
      In multi-run (-r/--repeat) printouts, print out the noise of
      the wall-clock average as well.
      
      Also, fix a bug in printing out scaled counters: if it was not
      scaled then we should not update the average with -1.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      566747e6
    • I
      perf stat: Add -n/--null option to run without counters · 0cfb7a13
      Ingo Molnar 提交于
      Allow a no-counters run. This can be useful to measure just
      elapsed wall-clock time - or to assess the raw overhead of perf
      stat itself, without running any counters.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cfb7a13
  24. 24 6月, 2009 2 次提交
  25. 20 6月, 2009 1 次提交
    • P
      perf_counter tools: Define and use our own u64, s64 etc. definitions · 9cffa8d5
      Paul Mackerras 提交于
      On 64-bit powerpc, __u64 is defined to be unsigned long rather than
      unsigned long long.  This causes compiler warnings every time we
      print a __u64 value with %Lx.
      
      Rather than changing __u64, we define our own u64 to be unsigned long
      long on all architectures, and similarly s64 as signed long long.
      For consistency we also define u32, s32, u16, s16, u8 and s8.  These
      definitions are put in a new header, types.h, because these definitions
      are needed in util/string.h and util/symbol.h.
      
      The main change here is the mechanical change of __[us]{64,32,16,8}
      to remove the "__".  The other changes are:
      
      * Create types.h
      * Include types.h in perf.h, util/string.h and util/symbol.h
      * Add types.h to the LIB_H definition in Makefile
      * Added (u64) casts in process_overflow_event() and print_sym_table()
        to kill two remaining warnings.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: benh@kernel.crashing.org
      LKML-Reference: <19003.33494.495844.956580@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9cffa8d5
  26. 13 6月, 2009 2 次提交
    • I
      perf stat: Enable raw data to be printed · ef281a19
      Ingo Molnar 提交于
      If -vv (very verbose) is specified, print out raw data
      in the following format:
      
      $ perf stat -vv -r 3 ./loop_1b_instructions
      
      [ perf stat: executing run #1 ... ]
      [ perf stat: executing run #2 ... ]
      [ perf stat: executing run #3 ... ]
      
      debug:              runtime[0]: 235871872
      debug:             walltime[0]: 236646752
      debug:       runtime_cycles[0]: 755150182
      debug:            counter/0[0]: 235871872
      debug:            counter/1[0]: 235871872
      debug:            counter/2[0]: 235871872
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235870662
      debug:            counter/2[1]: 235870662
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235870437
      debug:            counter/2[2]: 235870437
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 140
      debug:            counter/1[3]: 235870298
      debug:            counter/2[3]: 235870298
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755150182
      debug:            counter/1[4]: 235870145
      debug:            counter/2[4]: 235870145
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001411258
      debug:            counter/1[5]: 235868838
      debug:            counter/2[5]: 235868838
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 27897
      debug:            counter/1[6]: 235868560
      debug:            counter/2[6]: 235868560
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2910
      debug:            counter/1[7]: 235868151
      debug:            counter/2[7]: 235868151
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235980257
      debug:             walltime[0]: 236770942
      debug:       runtime_cycles[0]: 755114546
      debug:            counter/0[0]: 235980257
      debug:            counter/1[0]: 235980257
      debug:            counter/2[0]: 235980257
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 3
      debug:            counter/1[1]: 235980049
      debug:            counter/2[1]: 235980049
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235979907
      debug:            counter/2[2]: 235979907
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235979780
      debug:            counter/2[3]: 235979780
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755114546
      debug:            counter/1[4]: 235979652
      debug:            counter/2[4]: 235979652
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001439771
      debug:            counter/1[5]: 235979304
      debug:            counter/2[5]: 235979304
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 23723
      debug:            counter/1[6]: 235979050
      debug:            counter/2[6]: 235979050
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2213
      debug:            counter/1[7]: 235978820
      debug:            counter/2[7]: 235978820
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235888002
      debug:             walltime[0]: 236700533
      debug:       runtime_cycles[0]: 754881504
      debug:            counter/0[0]: 235888002
      debug:            counter/1[0]: 235888002
      debug:            counter/2[0]: 235888002
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235887793
      debug:            counter/2[1]: 235887793
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235887645
      debug:            counter/2[2]: 235887645
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235887499
      debug:            counter/2[3]: 235887499
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 754881504
      debug:            counter/1[4]: 235887368
      debug:            counter/2[4]: 235887368
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001401731
      debug:            counter/1[5]: 235887024
      debug:            counter/2[5]: 235887024
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 24212
      debug:            counter/1[6]: 235886786
      debug:            counter/2[6]: 235886786
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 1824
      debug:            counter/1[7]: 235886560
      debug:            counter/2[7]: 235886560
      debug:               scaled[7]: 0
      
       Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):
      
           235.913377  task-clock-msecs     #      0.997 CPUs    ( +-   0.011% )
                    2  context-switches     #      0.000 M/sec   ( +-   0.000% )
                    1  CPU-migrations       #      0.000 M/sec   ( +-   0.000% )
                  136  page-faults          #      0.001 M/sec   ( +-   0.730% )
            755048744  cycles               #   3200.534 M/sec   ( +-   0.009% )
           1001417586  instructions         #      1.326 IPC     ( +-   0.001% )
                25277  cache-references     #      0.107 M/sec   ( +-   3.988% )
                 2315  cache-misses         #      0.010 M/sec   ( +-   9.845% )
      
          0.236706075  seconds time elapsed.
      
      This allows the summary stats to be validated.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef281a19
    • I
      perf stat: Add feature to run and measure a command multiple times · 42202dd5
      Ingo Molnar 提交于
      Add the --repeat <n> feature to perf stat, which repeats a given
      command up to a 100 times, collects the stats and calculates an
      average and a stddev.
      
      For example, the following oneliner 'perf stat' command runs hackbench
      5 times and prints a tabulated result of all metrics, with averages
      and noise levels (in percentage) printed:
      
       aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
       Time: 0.117
       Time: 0.108
       Time: 0.089
       Time: 0.088
       Time: 0.100
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
          1243.989586  task-clock-msecs     #     10.460 CPUs    ( +-   4.720% )
                47706  context-switches     #      0.038 M/sec   ( +-  19.706% )
                  387  CPU-migrations       #      0.000 M/sec   ( +-   3.608% )
                17793  page-faults          #      0.014 M/sec   ( +-   0.354% )
           3770941606  cycles               #   3031.329 M/sec   ( +-   4.621% )
           1566372416  instructions         #      0.415 IPC     ( +-   2.703% )
             16783421  cache-references     #     13.492 M/sec   ( +-   5.202% )
              7128590  cache-misses         #      5.730 M/sec   ( +-   7.420% )
      
          0.118924455  seconds time elapsed.
      
      The goal of this feature is to allow the reliance on these accurate
      statistics and to know how many times a command has to be repeated
      for the noise to go down to an acceptable level.
      
      (The -v option can be used to see a line printed out as each run progresses.)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      42202dd5