1. 19 5月, 2010 11 次提交
    • A
      perf probe: Fix some error exit paths · b448c4b6
      Arnaldo Carvalho de Melo 提交于
      That could leave filedescriptors open and leak memory. Also stop using
      xmalloc, use malloc and handle results just like other error cases in
      the same routine that used it.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b448c4b6
    • A
      perf tools: Remove some unused functions · a41794cd
      Arnaldo Carvalho de Melo 提交于
      Without the bloated cplus_demangle from binutils, i.e building with:
      
      $ make NO_DEMANGLE=1 O=~acme/git/build/perf -j3 -C tools/perf/ install
      
      Before:
      
         text	   data	    bss	    dec	    hex	filename
       471851	  29280	4025056	4526187	 45106b	/home/acme/bin/perf
      
      After:
      
      [acme@doppio linux-2.6-tip]$ size ~/bin/perf
         text	   data	    bss	    dec	    hex	filename
       446886	  29232	4008576	4484694	 446e56	/home/acme/bin/perf
      
      So its a 5.3% size reduction in code, but the interesting part is in the git
      diff --stat output:
      
       19 files changed, 20 insertions(+), 1909 deletions(-)
      
      If we ever need some of the things we got from git but weren't using, we just
      have to go to the git repo and get fresh, uptodate source code bits.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a41794cd
    • S
      perf stat: add perf stat -B to pretty print large numbers · 5af52b51
      Stephane Eranian 提交于
      It is hard to read very large numbers so provide an option to perf stat
      to separate thousands using a separator. The patch leverages the locale
      support of stdio. You need to set your LC_NUMERIC appropriately, for
      instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
      feature. This way existing scripts parsing the output do not need to be
      changed. Here is an example.
      
      $ perf stat noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.347031  task-clock-msecs         #      0.998 CPUs
                       61  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      118  page-faults              #      0.000 M/sec
            4,138,410,900  cycles                   #   2070.917 M/sec  (scaled from 70.01%)
            2,062,650,268  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,057,653,466  branches                 #   1029.678 M/sec  (scaled from 70.01%)
                   40,267  branch-misses            #      0.002 %      (scaled from 30.04%)
            2,055,961,348  cache-references         #   1028.831 M/sec  (scaled from 30.03%)
                   53,725  cache-misses             #      0.027 M/sec  (scaled from 30.02%)
      
              2.001393933  seconds time elapsed
      
      $ perf stat -B  noploop 2
      noploop for 2 seconds
      
       Performance counter stats for 'noploop 2':
      
              1998.297883  task-clock-msecs         #      0.998 CPUs
                       59  context-switches         #      0.000 M/sec
                        0  CPU-migrations           #      0.000 M/sec
                      119  page-faults              #      0.000 M/sec
            4,131,380,160  cycles                   #   2067.450 M/sec  (scaled from 70.01%)
            2,059,096,507  instructions             #      0.498 IPC    (scaled from 70.01%)
            2,054,681,303  branches                 #   1028.216 M/sec  (scaled from 70.01%)
                   25,650  branch-misses            #      0.001 %      (scaled from 30.05%)
            2,056,283,014  cache-references         #   1029.017 M/sec  (scaled from 30.03%)
                   47,097  cache-misses             #      0.024 M/sec  (scaled from 30.02%)
      
              2.001391016  seconds time elapsed
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af52b51
    • L
      perf, sparc: Implement group scheduling transactional APIs · a13c3afd
      Lin Ming 提交于
      Convert to the transactional PMU API and remove the duplication of
      group_sched_in().
      
      [cross build only]
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1272002193.5707.65.camel@minggr.sh.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a13c3afd
    • P
      perf: Optimize perf_output_*() by avoiding local_xchg() · 6d1acfd5
      Peter Zijlstra 提交于
      Since the x86 XCHG ins implies LOCK, avoid the use by
      using a sequence count instead.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6d1acfd5
    • P
      perf: Optimize the hotpath by converting the perf output buffer to local_t · fa588151
      Peter Zijlstra 提交于
      Since there is now only a single writer, we can use
      local_t instead and avoid all these pesky LOCK insn.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa588151
    • P
      perf: Optimize the perf_output() path by removing IRQ-disables · ef60777c
      Peter Zijlstra 提交于
      Since we can now assume there is only a single writer
      to each buffer, we can remove per-cpu lock thingy and
      use a simply nest-count to the same effect.
      
      This removes the need to disable IRQs.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef60777c
    • P
      perf: Disallow mmap() on per-task inherited events · c7920614
      Peter Zijlstra 提交于
      Since we now have working per-task-per-cpu events for
      a while, disallow mmap() on per-task inherited
      events. Those things were a performance problem
      anyway, and doing away with it allows us to optimize
      the buffer somewhat by assuming there is only a
      single writer.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7920614
    • P
      perf: Optimize buffer placement by allocating buffers NUMA aware · a19d35c1
      Peter Zijlstra 提交于
      Ensure cpu bound buffers live on the right NUMA node.
      Suggested-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1274114880.5605.5236.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a19d35c1
    • S
      perf: Fix errors path in perf_output_begin() · 00d1d0b0
      Stephane Eranian 提交于
      In case the sampling buffer has no "payload" pages,
      nr_pages is 0. The problem is that the error path in
      perf_output_begin() skips to a label which assumes
      perf_output_lock() has been issued which is not the
      case. That triggers a WARN_ON() in
      perf_output_unlock().
      
      This patch fixes the problem by skipping
      perf_output_unlock() in case data->nr_pages is 0.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4bf13674.014fd80a.6c82.ffffb20c@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      00d1d0b0
    • P
      perf/ftrace: Optimize perf/tracepoint interaction for single events · 4f41c013
      Peter Zijlstra 提交于
      When we've got but a single event per tracepoint
      there is no reason to try and multiplex it so don't.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4f41c013
  2. 18 5月, 2010 11 次提交
  3. 17 5月, 2010 5 次提交
    • A
      perf tui: Add workaround for slang < 2.1.4 · dc4ff193
      Arnaldo Carvalho de Melo 提交于
      Older versions of the slang library didn't used the 'const' specifier,
      causing problems with modern compilers of this kind:
      
      util/newt.c:252: error: passing argument 1 of ‘SLsmg_printf’ discards
      qualifiers from pointer target type
      
      Fix it by using some wrappers that when needed const the affected
      parameters back to plain (char *).
      Reported-by: NLin Ming <ming.m.lin@intel.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <20100517145421.GD29052@ghostprotocols.net>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dc4ff193
    • S
      perf record: Fix bug mismatch with -c option definition · 3de29cab
      Stephane Eranian 提交于
      The -c option defines the user requested sampling period. It was implemented
      using an unsigned int variable but the type of the option was OPT_LONG. Thus,
      the option parser was overwriting memory belonging to other variables, namely
      the mmap_pages leading to a zero page sampling buffer. The bug was exposed only
      when compiling at -O0, probably because the compiler was padding variables at
      higher optimization levels.
      
      This patch fixes this problem by declaring user_interval as u64. This also
      avoids wrap-around issues for large period on 32-bit systems.
      
      Commiter note:
      
      Made it use OPT_U64(user_interval) after implementing OPT_U64 in the
      previous patch.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4bf11ae9.e88cd80a.06b0.ffffa8e3@mx.google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3de29cab
    • A
      perf options: Introduce OPT_U64 · 6ba85cea
      Arnaldo Carvalho de Melo 提交于
      We have things like user_interval (-c/--count) in 'perf record' that
      needs this.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6ba85cea
    • A
      perf tui: Add help window to show key associations · a9a4ab74
      Arnaldo Carvalho de Melo 提交于
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9a4ab74
    • A
      perf tui: Make <- exit menus too · a308f3a8
      Arnaldo Carvalho de Melo 提交于
      In fact it is now added to the hot key list when newt_form__new is used,
      allowing us to remove the explicit assignment in all its users.
      
      The visible change is that <- will exit the menu that pops up when -> is
      pressed (and Enter when callchains are not being used).
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a308f3a8
  4. 16 5月, 2010 4 次提交
  5. 15 5月, 2010 5 次提交
    • I
    • C
      x86, perf: P4 PMU - fix counters management logic · 1ff3d7d7
      Cyrill Gorcunov 提交于
      Jaswinder reported this #GP:
      
       |
       | Message from syslogd@ht at May 14 09:39:32 ...
       | kernel:[  314.908612] EIP: [<c100ccca>]
       | x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70
       |
      
      Ming has narrowed it down to a comparision issue
      between arguments with different sizes and
      signs. As result event index reached a wrong
      value which in turn led to a GP fault.
      
      At the same time it was found that p4_next_cntr
      has broken logic and should return the counter
      index only if it was not yet borrowed for
      another event.
      Reported-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Reported-by: NLin Ming <ming.m.lin@intel.com>
      Bisected-by: NLin Ming <ming.m.lin@intel.com>
      Tested-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100514190815.GG13509@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ff3d7d7
    • A
      perf newt: Make <- zoom out filters · 3e1bbdc3
      Arnaldo Carvalho de Melo 提交于
      After we use the filters to zoom into DSOs or threads, we can use <-
      (left arrow) to zoom out from the last filter applied.
      
      It is still possible to zoom out of order by using the popup menu.
      
      With this we now have the zoom out operation on the browsing fast path,
      by allowing fast navigation using just the four arrors and the enter key
      to expand collapse callchains.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e1bbdc3
    • A
      perf report: Report number of events, not samples · c82ee828
      Arnaldo Carvalho de Melo 提交于
      Number of samples is meaningless after we switched to auto-freq, so
      report the number of events, i.e. not the sum of the different periods,
      but the number PERF_RECORD_SAMPLE emitted by the kernel.
      
      While doing this I noticed that naming "count" to the sum of all the
      event periods can be confusing, so rename it to .period, just like in
      struct sample.data, so that we become more consistent.
      
      This helps with the next step, that was to record in struct hist_entry
      the number of sample events for each instance, we need that because we
      use it to generate the number of events when applying filters to the
      tree of hist entries like it is being done in the TUI report browser.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c82ee828
    • A
      perf hist: Clarify events_stats fields usage · cee75ac7
      Arnaldo Carvalho de Melo 提交于
      The events_stats.total field is too generic, rename it to .total_period,
      and also add a comment explaining that it is the sum of all the .period
      fields in samples, that is needed because we use auto-freq to avoid
      sampling artifacts.
      
      Ditto for events_stats.lost, that is the sum of all lost_event.lost
      fields, i.e. the number of events the kernel dropped.
      
      Looking at the users, builtin-sched.c can make use of these fields and
      stop doing it again.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cee75ac7
  6. 14 5月, 2010 4 次提交