1. 12 8月, 2011 7 次提交
    • M
      perf probe: Avoid searching variables in intermediate scopes · f182e3e1
      Masami Hiramatsu 提交于
      Fix variable searching logic to search one in inner than local scope or
      global(CU) scope. In the other words, skip searching in intermediate
      scopes.
      
      e.g., in the following code,
      
      int var1;
      
      void inline infunc(int i)
      {
          i++;   <--- [A]
      }
      
      void func(void)
      {
         int var1, var2;
         infunc(var2);
      }
      
      At [A], "var1" should point the global variable "var1", however, if user
      mis-typed as "var2", variable search should be failed. However, current
      logic searches variable infunc() scope, global scope, and then func()
      scope. Thus, it can find "var2" variable in func() scope. This may not
      be what user expects.
      
      So, it would better not search outer scopes except outermost (compile
      unit) scope which contains only global variables, when it failed to find
      given variable in local scope.
      
      E.g.
      
      Without this:
      $ perf probe -V pre_schedule --externs > without.vars
      
      With this:
      $ perf probe -V pre_schedule --externs > with.vars
      
      Check the diff:
      $ diff without.vars with.vars
      88d87
      <               int     cpu
      133d131
      <               long unsigned int*      switch_count
      
      These vars are actually in the scope of schedule(), the caller of
      pre_schedule().
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110305.19900.94374.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f182e3e1
    • M
      perf probe: Fix to search local variables in appropriate scope · 221d0611
      Masami Hiramatsu 提交于
      Fix perf probe to search local variables in appropriate local inlined
      function scope. For example, pre_schedule() has only 2 local variables,
      as below;
      
      $ perf probe -L pre_schedule
      <pre_schedule@/home/mhiramat/ksrc/linux-2.6/kernel/sched.c:0>
            0  static inline void pre_schedule(struct rq *rq, struct task_struct *prev)
               {
            2         if (prev->sched_class->pre_schedule)
            3                 prev->sched_class->pre_schedule(rq, prev);
               }
      
      However, current perf probe shows 4 local variables on pre_schedule(),
      because it searches variables in the caller(schedule()) scope.
      
      $ perf probe -V pre_schedule
      Available variables at pre_schedule
              @<schedule+445>
                      int     cpu
                      long unsigned int*      switch_count
                      struct rq*      rq
                      struct task_struct*     prev
      
      This patch fixes this issue by searching variables in the local scope of
      the instance of inlined function. Here is the result.
      
      $ perf probe -V pre_schedule
      Available variables at pre_schedule
              @<schedule+445>
                      struct rq*      rq
                      struct task_struct*     prev
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110259.19900.85664.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      221d0611
    • M
      perf probe: Fix to walk all inline instances · 36c0c588
      Masami Hiramatsu 提交于
      Fix line-range collector to walk all instances of inlined function,
      because some execution paths can be optimized out depending on the
      function argument of instances.
      
      E.g.)
      inline_func (arg) {
      	if (arg)
      		do_something;
      	else
      		do_another;
      }
      
      func_A() {
      	inline_func(1)
      }
      
      func_B() {
      	inline_func(0)
      }
      
      In this case, func_A may have only do_something code and func_B may have
      only do_another.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110247.19900.93702.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      36c0c588
    • M
      perf probe: Fix to search nested inlined functions in CU · b0e9cb28
      Masami Hiramatsu 提交于
      Fix perf probe to walk through the lines of all nested inlined function
      call sites and declared lines when a whole CU is passed to the line
      walker.
      
      The die_walk_lines() can have two different type of DIEs, subprogram (or
      inlined-subroutine) DIE and CU DIE.
      
      If a caller passes a subprogram DIE, this means that the walker walk on
      lines of given subprogram. In this case, it just needs to search on
      direct children of DIE tree for finding call-site information of inlined
      function which directly called from given subprogram.
      
      On the other hand, if a caller passes a CU DIE to the walker, this means
      that the walker have to walk on all lines in the source files included
      in given CU DIE. In this case, it has to search whole DIE trees of all
      subprograms to find the call-site information of all nested inlined
      functions.
      
      Without this patch:
      
      $ perf probe --line kernel/cpu.c:151-157
      </home/mhiramat/ksrc/linux-2.6/kernel/cpu.c:151>
      
               static int cpu_notify(unsigned long val, void *v)
               {
          154         return __cpu_notify(val, v, -1, NULL);
               }
      
      With this:
      $ perf probe --line kernel/cpu.c:151-157
      </home/mhiramat/ksrc/linux-2.6/kernel/cpu.c:151>
      
          152  static int cpu_notify(unsigned long val, void *v)
               {
          154         return __cpu_notify(val, v, -1, NULL);
               }
      
      As you can see, --line option with source line range shows the declared
      lines as probe-able.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110241.19900.34994.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b0e9cb28
    • M
      perf probe: Fix line walker to check CU correctly · a128405c
      Masami Hiramatsu 提交于
      Fix line walker to check whether a given DIE is CU or not.
      
      Actually this function accepts CU, subprogram and inlined_subroutine
      DIEs.
      
      Without this fix, perf probe always fails to analyze lines on inlined
      functions;
      
      $ perf probe -L pre_schedule
      Debuginfo analysis failed. (-2)
        Error: Failed to show lines. (-2)
      
      This fixes that bug, as below.
      
      $ perf probe -L pre_schedule
      <pre_schedule@/home/mhiramat/ksrc/linux-2.6/kernel/sched.c:0>
            0  static inline void pre_schedule(struct rq *rq, struct task_struct *prev
               {
            2         if (prev->sched_class->pre_schedule)
            3                 prev->sched_class->pre_schedule(rq, prev);
               }
      
               /* rq->lock is NOT held, but preemption is disabled */
      
      Changes from v1:
       - Update against current tip tree.(Fix dwarf-aux.c)
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110235.19900.20614.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a128405c
    • M
      perf probe: Fix a memory leak for scopes array · 8afa2a70
      Masami Hiramatsu 提交于
      Fix a memory leak for scopes array when it finds a variable in the
      global scope.
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110229.19900.63019.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8afa2a70
    • V
      perf: fix temporary file ownership check · e9b52ef2
      Vasiliy Kulikov 提交于
      A file in /tmp/ might be a symlink, so lstat() should be used instead of
      stat().
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110811205537.GA22864@albatrosSigned-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9b52ef2
  2. 11 8月, 2011 1 次提交
    • J
      perf report: Use properly build_id kernel binaries · f57b05ed
      Jiri Olsa 提交于
      If we bring the recorded perf data together with kernel binary from another
      machine using:
      
      	on server A:
      	perf archive
      
      	on server B:
      	tar xjvf perf.data.tar.bz2 -C ~/.debug
      
      the build_id kernel dso is not properly recognized during the "perf report"
      command on server B.
      
      The reason is, that build_id dsos are added during the session initialization,
      while the kernel maps are created during the sample event processing.
      
      The machine__create_kernel_maps functions ends up creating new dso object for
      kernel, but it does not check if we already have one added by build_id
      processing.
      
      Also the build_id reading ABI quirk added in commit:
      
       - commit b2511481
         perf build-id: Add quirk to deal with perf.data file format breakage
      
      populates the "struct build_id_event::pid" with 0, which
      is later interpreted as DEFAULT_GUEST_KERNEL_ID.
      
      This is not always correct, so it's better to guess the pid
      value based on the "struct build_id_event::header::misc" value.
      
      - Tested with data generated on x86 kernel version v2.6.34
        and reported back on x86_64 current kernel.
      - Not tested for guest kernel case.
      
      Note the problem stays for PERF_RECORD_MMAP events recorded by perf that
      does not use proper pid (HOST_KERNEL_ID/DEFAULT_GUEST_KERNEL_ID). They are
      misinterpreted within the current perf code. Probably there's not much we
      can do about that.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20110601194346.GB1934@jolsa.brq.redhat.comSigned-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f57b05ed
  3. 10 8月, 2011 2 次提交
  4. 09 8月, 2011 1 次提交
  5. 08 8月, 2011 3 次提交
  6. 26 7月, 2011 1 次提交
  7. 25 7月, 2011 1 次提交
  8. 22 7月, 2011 1 次提交
  9. 21 7月, 2011 4 次提交
  10. 16 7月, 2011 7 次提交
  11. 15 7月, 2011 1 次提交
  12. 05 7月, 2011 1 次提交
    • A
      perf report/annotate/script: Add option to specify a CPU range · 5d67be97
      Anton Blanchard 提交于
      Add an option to perf report/annotate/script to specify which
      CPUs to operate on. This enables us to take a single system wide
      profile and analyse each CPU (or group of CPUs) in isolation.
      
      This was useful when profiling a multiprocess workload where the
      bottleneck was on one CPU but this was hidden in the overall
      profile. Per process and per thread breakdowns didn't help
      because multiple processes were running on each CPU and no
      single process consumed an entire CPU.
      
      The patch converts the list of CPUs returned by cpu_map__new
      into a bitmap for fast lookup. I wanted to use -C to be
      consistent with perf top/record/stat, but unfortunately perf
      report already uses -C <comms>.
      
       v2: Incorporate suggestions from David Ahern:
      	- Added -c to perf script
      	- Check that SAMPLE_CPU is set when -c is used
      	- Update documentation
      
       v3: Create perf_session__cpu_bitmap()
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Link: http://lkml.kernel.org/r/20110704215750.11647eb9@krytenSigned-off-by: NIngo Molnar <mingo@elte.hu>
      5d67be97
  13. 30 6月, 2011 5 次提交
    • F
      perf tools: Allow sort dimensions to be registered more than once · fd8ea212
      Frederic Weisbecker 提交于
      So that the parent sort dimension can be registered twice: once
      if we add it as an explicit sort dimension (-s parent) and twice
      if we request a parent filter (-p foo).
      
      We'll have only one parent sort dimension in the end but this
      allows to override the default parent filter with we gave in "-p"
      option. The goal of this is to prepare to allow the use of
      "-s parent" and "-p foo" at the same time, ie: sort by filtered
      parent.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      fd8ea212
    • F
      perf tools: Don't display ignored entries on stdio ui · e84d2122
      Frederic Weisbecker 提交于
      As for newt ui, don't display entries that have been marked
      as ignored.
      
      The practical current effect of this is to make parent
      filtering really working. Before, entries that were ignored
      were given a null parent but were still displayed. This
      resulted in some weird effects:
      
       # Overhead      Command      Shared Object        Symbol
       # ........  ...........  .................  ............
       #
      ^A
                         |
                         --- __lock_acquire
                            |
                            |--95.97%-- lock_acquire
                            |          |
                            |          |--30.75%-- _raw_spin_lock
      
      Discard these from the stdio display.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      e84d2122
    • F
      perf tools: Remove sort print helpers declarations · 2fd701bc
      Frederic Weisbecker 提交于
      These are probably some old leftovers.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      2fd701bc
    • F
      perf tools: Make sort operations static · 872a878f
      Frederic Weisbecker 提交于
      These don't need to be globally visible.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      872a878f
    • S
      perf tools: Add inverted call graph report support. · d797fdc5
      Sam Liao 提交于
      Add "caller/callee" option to support inverted butterfly report,
      in the inverted report (with caller option), the call graph start
      from the callee's ancestor. Users can use such view to catch system's
      performance bottleneck from a sysprof like view. Using this option
      with specified sort order like pid gives us high level view of call
      graph statistics.
      
      Also add "-G" alias for inverted call graph.
      Signed-off-by: NSam Liao <phyomh@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d797fdc5
  14. 16 6月, 2011 1 次提交
  15. 15 6月, 2011 1 次提交
    • S
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li 提交于
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Tested-by: N"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09223371
  16. 10 6月, 2011 1 次提交
  17. 03 6月, 2011 2 次提交