1. 16 7月, 2011 4 次提交
  2. 15 7月, 2011 1 次提交
  3. 05 7月, 2011 1 次提交
    • A
      perf report/annotate/script: Add option to specify a CPU range · 5d67be97
      Anton Blanchard 提交于
      Add an option to perf report/annotate/script to specify which
      CPUs to operate on. This enables us to take a single system wide
      profile and analyse each CPU (or group of CPUs) in isolation.
      
      This was useful when profiling a multiprocess workload where the
      bottleneck was on one CPU but this was hidden in the overall
      profile. Per process and per thread breakdowns didn't help
      because multiple processes were running on each CPU and no
      single process consumed an entire CPU.
      
      The patch converts the list of CPUs returned by cpu_map__new
      into a bitmap for fast lookup. I wanted to use -C to be
      consistent with perf top/record/stat, but unfortunately perf
      report already uses -C <comms>.
      
       v2: Incorporate suggestions from David Ahern:
      	- Added -c to perf script
      	- Check that SAMPLE_CPU is set when -c is used
      	- Update documentation
      
       v3: Create perf_session__cpu_bitmap()
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Link: http://lkml.kernel.org/r/20110704215750.11647eb9@krytenSigned-off-by: NIngo Molnar <mingo@elte.hu>
      5d67be97
  4. 30 6月, 2011 5 次提交
    • F
      perf tools: Allow sort dimensions to be registered more than once · fd8ea212
      Frederic Weisbecker 提交于
      So that the parent sort dimension can be registered twice: once
      if we add it as an explicit sort dimension (-s parent) and twice
      if we request a parent filter (-p foo).
      
      We'll have only one parent sort dimension in the end but this
      allows to override the default parent filter with we gave in "-p"
      option. The goal of this is to prepare to allow the use of
      "-s parent" and "-p foo" at the same time, ie: sort by filtered
      parent.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      fd8ea212
    • F
      perf tools: Don't display ignored entries on stdio ui · e84d2122
      Frederic Weisbecker 提交于
      As for newt ui, don't display entries that have been marked
      as ignored.
      
      The practical current effect of this is to make parent
      filtering really working. Before, entries that were ignored
      were given a null parent but were still displayed. This
      resulted in some weird effects:
      
       # Overhead      Command      Shared Object        Symbol
       # ........  ...........  .................  ............
       #
      ^A
                         |
                         --- __lock_acquire
                            |
                            |--95.97%-- lock_acquire
                            |          |
                            |          |--30.75%-- _raw_spin_lock
      
      Discard these from the stdio display.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      e84d2122
    • F
      perf tools: Remove sort print helpers declarations · 2fd701bc
      Frederic Weisbecker 提交于
      These are probably some old leftovers.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      2fd701bc
    • F
      perf tools: Make sort operations static · 872a878f
      Frederic Weisbecker 提交于
      These don't need to be globally visible.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      872a878f
    • S
      perf tools: Add inverted call graph report support. · d797fdc5
      Sam Liao 提交于
      Add "caller/callee" option to support inverted butterfly report,
      in the inverted report (with caller option), the call graph start
      from the callee's ancestor. Users can use such view to catch system's
      performance bottleneck from a sysprof like view. Using this option
      with specified sort order like pid gives us high level view of call
      graph statistics.
      
      Also add "-G" alias for inverted call graph.
      Signed-off-by: NSam Liao <phyomh@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d797fdc5
  5. 16 6月, 2011 1 次提交
  6. 15 6月, 2011 1 次提交
    • S
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li 提交于
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Tested-by: N"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09223371
  7. 10 6月, 2011 1 次提交
  8. 03 6月, 2011 10 次提交
  9. 02 6月, 2011 3 次提交
  10. 28 5月, 2011 1 次提交
    • D
      perf events: initialize fd array to -1 instead of 0 · 4af4c955
      David Ahern 提交于
      perf_evsel__alloc_fd allocates an array of file descriptors with the
      memory initialized to 0. The array has dimensions for cpus and threads.
      
      Later, __perf_evsel__open calls sys_perf_event_open for each cpu and thread
      dimensions. If the open fails for any of the cpus or threads then the fd's
      for this event are closed and the fd entry in the array is set to -1. Now,
      if the first attempt fails for the event (e.g., the event is not supported)
      the remaining dimensions (cpu > 0 and thread > 0) are not touched and left
      at the initialized value of 0.
      
      builtin-stat catches ENOENT and ENOSYS failures and allows the command to
      continue. The end result is that stat attempts to read from an fd of 0 which
      of course is stdin and so the command hangs until you type ctrl-D.
      
      Resolve by initializing the array to -1 since an fd < 0 is already
      handled.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1306511914-8016-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4af4c955
  11. 26 5月, 2011 2 次提交
    • A
      perf tools: Fix build on older systems · 75911c9b
      Arnaldo Carvalho de Melo 提交于
      Where /usr/include/linux/const.h is not present, e.g. RHEL5.
      Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Link: http://lkml.kernel.org/n/tip-ypcw2mu0w7dl1rrc6ncz3pee@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      75911c9b
    • A
      perf symbols: Handle /proc/sys/kernel/kptr_restrict · ec80fde7
      Arnaldo Carvalho de Melo 提交于
      Perf uses /proc/modules to figure out where kernel modules are loaded.
      
      With the advent of kptr_restrict, non root users get zeroes for all module
      start addresses.
      
      So check if kptr_restrict is non zero and don't generate the syntethic
      PERF_RECORD_MMAP events for them.
      
      Warn the user about it in perf record and in perf report.
      
      In perf report the reference relocation symbol being zero means that
      kptr_restrict was set, thus /proc/kallsyms has only zeroed addresses, so don't
      use it to fixup symbol addresses when using a valid kallsyms (in the buildid
      cache) or vmlinux (in the vmlinux path) build-id located automatically or
      specified by the user.
      
      Provide an explanation about it in 'perf report' if kernel samples were taken,
      checking if a suitable vmlinux or kallsyms was found/specified.
      
      Restricted /proc/kallsyms don't go to the buildid cache anymore.
      
      Example:
      
       [acme@emilia ~]$ perf record -F 100000 sleep 1
      
       WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check
       /proc/sys/kernel/kptr_restrict.
      
       Samples in kernel functions may not be resolved if a suitable vmlinux file is
       not found in the buildid cache or in the vmlinux path.
      
       Samples in kernel modules won't be resolved at all.
      
       If some relocation was applied (e.g. kexec) symbols may be misresolved even
       with a suitable vmlinux or kallsyms file.
      
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.005 MB perf.data (~231 samples) ]
       [acme@emilia ~]$
      
       [acme@emilia ~]$ perf report --stdio
       Kernel address maps (/proc/{kallsyms,modules}) were restricted,
       check /proc/sys/kernel/kptr_restrict before running 'perf record'.
      
       If some relocation was applied (e.g. kexec) symbols may be misresolved.
      
       Samples in kernel modules can't be resolved as well.
      
       # Events: 13  cycles
       #
       # Overhead  Command      Shared Object                 Symbol
       # ........  .......  .................  .....................
       #
          20.24%    sleep  [kernel.kallsyms]  [k] page_fault
          20.04%    sleep  [kernel.kallsyms]  [k] filemap_fault
          19.78%    sleep  [kernel.kallsyms]  [k] __lru_cache_add
          19.69%    sleep  ld-2.12.so         [.] memcpy
          14.71%    sleep  [kernel.kallsyms]  [k] dput
           4.70%    sleep  [kernel.kallsyms]  [k] flush_signal_handlers
           0.73%    sleep  [kernel.kallsyms]  [k] perf_event_comm
           0.11%    sleep  [kernel.kallsyms]  [k] native_write_msr_safe
      
       #
       # (For a higher level overview, try: perf report --sort comm,dso)
       #
       [acme@emilia ~]$
      
      This is because it found a suitable vmlinux (build-id checked) in
      /lib/modules/2.6.39-rc7+/build/vmlinux (use -v in perf report to see the long
      file name).
      
      If we remove that file from the vmlinux path:
      
       [root@emilia ~]# mv /lib/modules/2.6.39-rc7+/build/vmlinux \
      		     /lib/modules/2.6.39-rc7+/build/vmlinux.OFF
       [acme@emilia ~]$ perf report --stdio
       [kernel.kallsyms] with build id 57298cdbe0131f6871667ec0eaab4804dcf6f562
       not found, continuing without symbols
      
       Kernel address maps (/proc/{kallsyms,modules}) were restricted, check
       /proc/sys/kernel/kptr_restrict before running 'perf record'.
      
       As no suitable kallsyms nor vmlinux was found, kernel samples can't be
       resolved.
      
       Samples in kernel modules can't be resolved as well.
      
       # Events: 13  cycles
       #
       # Overhead  Command      Shared Object  Symbol
       # ........  .......  .................  ......
       #
          80.31%    sleep  [kernel.kallsyms]  [k] 0xffffffff8103425a
          19.69%    sleep  ld-2.12.so         [.] memcpy
      
       #
       # (For a higher level overview, try: perf report --sort comm,dso)
       #
       [acme@emilia ~]$
      Reported-by: NStephane Eranian <eranian@google.com>
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Link: http://lkml.kernel.org/n/tip-mt512joaxxbhhp1odop04yit@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec80fde7
  12. 24 5月, 2011 1 次提交
  13. 23 5月, 2011 2 次提交
  14. 22 5月, 2011 6 次提交
  15. 21 5月, 2011 1 次提交
    • L
      sanitize <linux/prefetch.h> usage · 268bb0ce
      Linus Torvalds 提交于
      Commit e66eed65 ("list: remove prefetching from regular list
      iterators") removed the include of prefetch.h from list.h, which
      uncovered several cases that had apparently relied on that rather
      obscure header file dependency.
      
      So this fixes things up a bit, using
      
         grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
         grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')
      
      to guide us in finding files that either need <linux/prefetch.h>
      inclusion, or have it despite not needing it.
      
      There are more of them around (mostly network drivers), but this gets
      many core ones.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      268bb0ce