1. 24 9月, 2011 9 次提交
    • D
      perf tool: Fix endianness handling of u32 data in samples · 936be503
      David Ahern 提交于
      Currently, analyzing PPC data files on x86 the cpu field is always 0 and
      the tid and pid are backwards. For example, analyzing a PPC file on PPC
      the pid/tid fields show:
      
              rsyslogd  1210/1212
      
      and analyzing the same PPC file using an x86 perf binary shows:
      
              rsyslogd  1212/1210
      
      The problem is that the swap_op method for samples is
      perf_event__all64_swap which assumes all elements in the sample_data
      struct are u64s. cpu, tid and pid are u32s and need to be handled
      individually. Given that the swap is done before the sample is parsed,
      the simplest solution is to undo the 64-bit swap of those elements when
      the sample is parsed and do the proper swap.
      
      The RAW data field is generic and perf cannot have programmatic knowledge
      of how to treat that data. Instead a warning is given to the user.
      
      Thanks to Anton Blanchard for providing a data file for a mult-CPU
      PPC system so I could verify the fix for the CPU fields.
      
      v3 -> v4:
      - fixed use of WARN_ONCE
      
      v2 -> v3:
      - used WARN_ONCE for message regarding raw data
      - removed struct wrapper around union
      - fixed whitespace issues
      
      v1 -> v2:
      - added a union for undoing the byte-swap on u64 and redoing swap on
        u32's to address compiler errors (see git commit 65014ab3)
      
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      936be503
    • A
      perf sort: Fix symbol sort output by separating unresolved samples by type · 6bb8f311
      Anton Blanchard 提交于
      I took a profile that suggested 60% of total CPU time was in the
      hypervisor:
      
      ...
          60.20%  [H] 0x33d43c
           4.43%  [k] ._spin_lock_irqsave
           1.07%  [k] ._spin_lock
      
      Using perf stat to get the user/kernel/hypervisor breakdown contradicted
      this.
      
      The problem is we merge all unresolved samples into the one unknown
      bucket. If add a comparison by sample type to sort__sym_cmp we get the
      real picture:
      
      ...
          57.11%  [.] 0x80fbf63c
           4.43%  [k] ._spin_lock_irqsave
           1.07%  [k] ._spin_lock
           0.65%  [H] 0x33d43c
      
      So it was almost all userspace, not hypervisor as the initial profile
      suggested.
      
      I found another issue while adding this. Symbol sorting sometimes shows
      multiple entries for the unknown bucket:
      
      ...
          16.65%  [.] 0x6cd3a8
           7.25%  [.] 0x422460
           5.37%  [.] yylex
           4.79%  [.] malloc
           4.78%  [.] _int_malloc
           4.03%  [.] _int_free
           3.95%  [.] hash_source_code_string
           2.82%  [.] 0x532908
           2.64%  [.] 0x36b538
           0.94%  [H] 0x8000000000e132a4
           0.82%  [H] 0x800000000000e8b0
      
      This happens because we aren't consistent with our sorting. On
      one hand we check to see if both symbols match and for two unresolved
      samples sym is NULL so we match:
      
              if (left->ms.sym == right->ms.sym)
                      return 0;
      
      On the other hand we use sample IP for unresolved samples when
      comparing against a symbol:
      
             ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
             ip_r = right->ms.sym ? right->ms.sym->start : right->ip;
      
      This means unresolved samples end up spread across the rbtree and we
      can't merge them all.
      
      If we use cmp_null all unresolved samples will end up in the one bucket
      and the output makes more sense:
      
      ...
          39.12%  [.] 0x36b538
           5.37%  [.] yylex
           4.79%  [.] malloc
           4.78%  [.] _int_malloc
           4.03%  [.] _int_free
           3.95%  [.] hash_source_code_string
           2.26%  [H] 0x800000000000e8b0
      Acked-by: NEric B Munson <emunson@mgebm.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ian Munsie <imunsie@au1.ibm.com>
      Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6bb8f311
    • A
      perf symbols: Synthesize anonymous mmap events · 6a0e55d8
      Anton Blanchard 提交于
      perf_event__synthesize_mmap_events does not create anonymous mmap events
      even though the kernel does. As a result an already running application
      with dynamically created code will not get profiled - all samples end up
      in the unknown bucket.
      
      This patch skips any entries with '[' in the name to avoid adding events
      for special regions (eg the vsyscall page). All other executable mmaps
      are assumed to be anonymous and an event is synthesized.
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Link: http://lkml.kernel.org/r/20110830091506.60b51fe8@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6a0e55d8
    • D
      perf record: Create events initially disabled and enable after init · 764e16a3
      David Ahern 提交于
      perf-record currently creates events enabled. When doing a system wide
      collection (-a arg) this causes data collection for perf's
      initialization activities -- eg., perf_event__synthesize_threads().
      
      For some events (e.g., context switch S/W event or tracepoints like
      syscalls) perf's initialization causes a lot of events to be captured
      frequently generating "Check IO/CPU overload!" warnings on larger
      systems (e.g., 2 socket, quad core, hyperthreading).
      
      perf's initialization phase can be skipped by creating events
      disabled and then enabling them once the initialization is done.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      764e16a3
    • A
      perf symbols: Add some heuristics for choosing the best duplicate symbol · 694bf407
      Anton Blanchard 提交于
      Try and pick the best symbol based on a few heuristics:
      
      -  Prefer a non weak symbol over a weak one
      -  Prefer a global symbol over a non global one
      -  Prefer a symbol with less underscores (idea taken from kallsyms.c)
      -  If all else fails, choose the symbol with the longest name
      
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110824065243.161953371@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      694bf407
    • A
      perf symbols: Preserve symbol scope when parsing /proc/kallsyms · 31877908
      Anton Blanchard 提交于
      kallsyms__parse capitalises the symbol type, so every symbol is marked
      global. Remove this and fix symbol_type__is_a to handle both local and
      global symbols.
      
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110824065243.077125989@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      31877908
    • A
      perf symbols: /proc/kallsyms does not sort module symbols · 3f5a4272
      Anton Blanchard 提交于
      kallsyms__parse assumes that /proc/kallsyms is sorted and sets the end
      of the previous symbol to the start of the current one.
      
      Unfortunately module symbols are not sorted, eg:
      
      ffffffffa0081f30 t e1000_clean_rx_irq   [e1000e]
      ffffffffa00817a0 t e1000_alloc_rx_buffers       [e1000e]
      
      Some symbols end up with a negative length and others have a length
      larger than they should. This results in confusing perf output.
      
      We already have a function to fixup the end of zero length symbols so
      use that instead.
      
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110824065242.969681349@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3f5a4272
    • A
      perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files · adb09184
      Anton Blanchard 提交于
      64bit PowerPC debuginfo files have an empty function descriptor section.
      I hit a SEGV when perf tried to use this section for symbol resolution.
      
      To fix this we need to check the section is valid and we can do this by
      checking for type SHT_PROGBITS.
      
      Cc: <stable@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Eric B Munson <emunson@mgebm.net>
      Link: http://lkml.kernel.org/r/20110824065242.895239970@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      adb09184
    • M
      perf probe: Fix regression of variable finder · f66fedcb
      Masami Hiramatsu 提交于
      Fix to call convert_variable() if previous call does not fail.
      
      To call convert_variable, it ensures "ret" is 0. However, since
      "ret" has the return value of synthesize_perf_probe_arg() which
      always returns positive value if it succeeded, perf probe doesn't
      call convert_variable(). This will cause a SEGV when we add an
      event with arguments.
      
      This has to be fixed as it ensures "ret" is greater than 0
      (or not negative).
      
      This regression has been introduced by my previous patch, f182e3e1.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110820053922.3286.65805.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f66fedcb
  2. 18 8月, 2011 4 次提交
    • J
      perf tools: Fix build against newer glibc · 195bcbf5
      Josh Boyer 提交于
      Upstream glibc commit 295e904 added a definition for __attribute_const__
      to cdefs.h.  This causes the following error when building perf:
      
      util/include/linux/compiler.h:8:0: error: "__attribute_const__"
      redefined [-Werror] /usr/include/sys/cdefs.h:226:0: note: this is the
      location of the previous definition
      
      Wrap __attribute_const__ in #ifndef as we do for __always_inline.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110818113720.GL2227@zod.bos.redhat.comSigned-off-by: NJosh Boyer <jwboyer@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      195bcbf5
    • S
      perf tools: Fix error handling of unknown events · 777d1d71
      Stephane Eranian 提交于
      There was a problem with the parse_events() code not printing the
      correct event name when an event was unknown and starting with an 'r'.
      The source of the problem was the way raw notation was parsed.
      
      Without the patch:
      	$ perf stat -e retired_foo
      	invalid event modifier: 'tired_foo'
      
      With the patch:
      	$ perf stat -e retired_foo
      	invalid or unsupported event: 'retired_foo'
      
      This also covers the case where the name of the event was not printed at
      all when perf was linked with libpfm4.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110723021043.GA20178@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      777d1d71
    • S
      perf evlist: Fix missing event name init for default event · cc2d86b0
      Stephane Eranian 提交于
      When no event is given to perf record, perf top, a default event is
      initialized (cycles). However, perf_evlist__add_default() was not
      setting the symbolic name for the event. Perf top worked simply because
      it was reconstructing the name from the event code. But it should not
      have to do this. This patch initializes the evsel->name field properly.
      
      This second version improves the code flow on the non error path.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110607161936.GA8163@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      [committer note: Use perf_evsel__delete() instead of plain free()]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cc2d86b0
    • S
      perf list: Fix exit value · 77e57297
      Stephane Eranian 提交于
      This patch fixes an issue with the exit value of perf list:
      
      $ perf list; echo $?
      129
      
      perf list returns an error exit code even though there is no error.
      
      There was a stray exit(129) in print_events(). This patch removes this
      exit().
      
      $ perf list; echo $?
      0
      
      $ perf list hw sw
        cpu-cycles OR cycles                               [Hardware event]
        stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
        stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
        instructions                                       [Hardware event]
        cache-references                                   [Hardware event]
        cache-misses                                       [Hardware event]
        branch-instructions OR branches                    [Hardware event]
        branch-misses                                      [Hardware event]
        bus-cycles                                         [Hardware event]
      
        cpu-clock                                          [Software event]
        task-clock                                         [Software event]
        page-faults OR faults                              [Software event]
        minor-faults                                       [Software event]
        major-faults                                       [Software event]
        context-switches OR cs                             [Software event]
        cpu-migrations OR migrations                       [Software event]
        alignment-faults                                   [Software event]
        emulation-faults                                   [Software event]
      $ echo $?
      0
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110523123917.GA31060@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      77e57297
  3. 12 8月, 2011 9 次提交
    • M
      perf probe: Filter out redundant inline-instances · 3f4460a2
      Masami Hiramatsu 提交于
      With gcc4.6, some instances of concrete inlined function looks redundant
      and broken, because it appears inside of a concrete instance and its
      call_file and call_line are same as the original abstruct's decl_file
      and decl_line respectively.
      
      e.g.
       [  d1aa]    subprogram
                   external             (flag) Yes
                   name                 (strp) "add_timer"
                   decl_file            (data1) 2		;here is original
                   decl_line            (data2) 847		;line and file
                   prototyped           (flag) Yes
                   inline               (data1) inlined (1)
                   sibling              (ref4) [  d1c6]
      ...
       [ 11d84]    subprogram
                   abstract_origin      (ref4) [  d1aa]	; concrete instance
                   low_pc               (addr) .text+0x000000000000246f <add_timer>
                   high_pc              (addr) .text+0x000000000000248b <mod_timer_pending>
                   frame_base           (block1)               [   0] call_frame_cfa
                   sibling              (ref4) [ 11dd9]
       [ 11d9f]      formal_parameter
                     abstract_origin      (ref4) [  d1b9]
                     location             (data4) location list [  701b]
       [ 11da8]      inlined_subroutine
                     abstract_origin      (ref4) [  d1aa]	; redundant instance
                     low_pc               (addr) .text+0x000000000000247e <add_timer+0xf>
                     high_pc              (addr) .text+0x0000000000002480 <add_timer+0x11>
                     call_file            (data1) 2		; call line and file
                     call_line            (data2) 847		; are same as above
      
      Those redundant instances leads unwilling results;
      
      e.g. find probe points inside of functions even if we specify
      a function entry as below;
      
      $ perf probe -V add_timer
      Available variables at add_timer
              @<add_timer+0>
                      struct timer_list*      timer
              @<add_timer+15>
                      (No matched variables)
      
      So, this filters out those redundant instances based on call-site and
      decl-site information.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110317.19900.59525.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3f4460a2
    • M
      perf probe: Search concrete out-of-line instances · db0d2c64
      Masami Hiramatsu 提交于
      gcc 4.6 generates a concrete out-of-line instance when there is a
      function which is implicitly inlined somewhere but also has its own
      instance. The concrete out-of-line instance means that it has an
      abstract origin of the function which is referred by not only
      inlined-subroutines but also a concrete subprogram.
      
      Since current dwarf_func_inline_instances() can find only instances of
      inlined-subroutines, this introduces new die_walk_instances() to find
      both of subprogram and inlined-subroutines.
      
      e.g. without this,
      Available variables at sched_group_rt_period
              @<cpu_rt_period_read_uint+9>
                      struct task_group*      tg
      
      perf probe failed to find actual subprogram instance of
      sched_group_rt_period().
      
      With this,
      
      Available variables at sched_group_rt_period
              @<cpu_rt_period_read_uint+9>
                      struct task_group*      tg
              @<sched_group_rt_period+0>
                      struct task_group*      tg
      
      Now it found the sched_group_rt_period() itself.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110311.19900.63997.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db0d2c64
    • M
      perf probe: Avoid searching variables in intermediate scopes · f182e3e1
      Masami Hiramatsu 提交于
      Fix variable searching logic to search one in inner than local scope or
      global(CU) scope. In the other words, skip searching in intermediate
      scopes.
      
      e.g., in the following code,
      
      int var1;
      
      void inline infunc(int i)
      {
          i++;   <--- [A]
      }
      
      void func(void)
      {
         int var1, var2;
         infunc(var2);
      }
      
      At [A], "var1" should point the global variable "var1", however, if user
      mis-typed as "var2", variable search should be failed. However, current
      logic searches variable infunc() scope, global scope, and then func()
      scope. Thus, it can find "var2" variable in func() scope. This may not
      be what user expects.
      
      So, it would better not search outer scopes except outermost (compile
      unit) scope which contains only global variables, when it failed to find
      given variable in local scope.
      
      E.g.
      
      Without this:
      $ perf probe -V pre_schedule --externs > without.vars
      
      With this:
      $ perf probe -V pre_schedule --externs > with.vars
      
      Check the diff:
      $ diff without.vars with.vars
      88d87
      <               int     cpu
      133d131
      <               long unsigned int*      switch_count
      
      These vars are actually in the scope of schedule(), the caller of
      pre_schedule().
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110305.19900.94374.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f182e3e1
    • M
      perf probe: Fix to search local variables in appropriate scope · 221d0611
      Masami Hiramatsu 提交于
      Fix perf probe to search local variables in appropriate local inlined
      function scope. For example, pre_schedule() has only 2 local variables,
      as below;
      
      $ perf probe -L pre_schedule
      <pre_schedule@/home/mhiramat/ksrc/linux-2.6/kernel/sched.c:0>
            0  static inline void pre_schedule(struct rq *rq, struct task_struct *prev)
               {
            2         if (prev->sched_class->pre_schedule)
            3                 prev->sched_class->pre_schedule(rq, prev);
               }
      
      However, current perf probe shows 4 local variables on pre_schedule(),
      because it searches variables in the caller(schedule()) scope.
      
      $ perf probe -V pre_schedule
      Available variables at pre_schedule
              @<schedule+445>
                      int     cpu
                      long unsigned int*      switch_count
                      struct rq*      rq
                      struct task_struct*     prev
      
      This patch fixes this issue by searching variables in the local scope of
      the instance of inlined function. Here is the result.
      
      $ perf probe -V pre_schedule
      Available variables at pre_schedule
              @<schedule+445>
                      struct rq*      rq
                      struct task_struct*     prev
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110259.19900.85664.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      221d0611
    • M
      perf probe: Fix to walk all inline instances · 36c0c588
      Masami Hiramatsu 提交于
      Fix line-range collector to walk all instances of inlined function,
      because some execution paths can be optimized out depending on the
      function argument of instances.
      
      E.g.)
      inline_func (arg) {
      	if (arg)
      		do_something;
      	else
      		do_another;
      }
      
      func_A() {
      	inline_func(1)
      }
      
      func_B() {
      	inline_func(0)
      }
      
      In this case, func_A may have only do_something code and func_B may have
      only do_another.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110247.19900.93702.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      36c0c588
    • M
      perf probe: Fix to search nested inlined functions in CU · b0e9cb28
      Masami Hiramatsu 提交于
      Fix perf probe to walk through the lines of all nested inlined function
      call sites and declared lines when a whole CU is passed to the line
      walker.
      
      The die_walk_lines() can have two different type of DIEs, subprogram (or
      inlined-subroutine) DIE and CU DIE.
      
      If a caller passes a subprogram DIE, this means that the walker walk on
      lines of given subprogram. In this case, it just needs to search on
      direct children of DIE tree for finding call-site information of inlined
      function which directly called from given subprogram.
      
      On the other hand, if a caller passes a CU DIE to the walker, this means
      that the walker have to walk on all lines in the source files included
      in given CU DIE. In this case, it has to search whole DIE trees of all
      subprograms to find the call-site information of all nested inlined
      functions.
      
      Without this patch:
      
      $ perf probe --line kernel/cpu.c:151-157
      </home/mhiramat/ksrc/linux-2.6/kernel/cpu.c:151>
      
               static int cpu_notify(unsigned long val, void *v)
               {
          154         return __cpu_notify(val, v, -1, NULL);
               }
      
      With this:
      $ perf probe --line kernel/cpu.c:151-157
      </home/mhiramat/ksrc/linux-2.6/kernel/cpu.c:151>
      
          152  static int cpu_notify(unsigned long val, void *v)
               {
          154         return __cpu_notify(val, v, -1, NULL);
               }
      
      As you can see, --line option with source line range shows the declared
      lines as probe-able.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110241.19900.34994.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b0e9cb28
    • M
      perf probe: Fix line walker to check CU correctly · a128405c
      Masami Hiramatsu 提交于
      Fix line walker to check whether a given DIE is CU or not.
      
      Actually this function accepts CU, subprogram and inlined_subroutine
      DIEs.
      
      Without this fix, perf probe always fails to analyze lines on inlined
      functions;
      
      $ perf probe -L pre_schedule
      Debuginfo analysis failed. (-2)
        Error: Failed to show lines. (-2)
      
      This fixes that bug, as below.
      
      $ perf probe -L pre_schedule
      <pre_schedule@/home/mhiramat/ksrc/linux-2.6/kernel/sched.c:0>
            0  static inline void pre_schedule(struct rq *rq, struct task_struct *prev
               {
            2         if (prev->sched_class->pre_schedule)
            3                 prev->sched_class->pre_schedule(rq, prev);
               }
      
               /* rq->lock is NOT held, but preemption is disabled */
      
      Changes from v1:
       - Update against current tip tree.(Fix dwarf-aux.c)
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110235.19900.20614.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a128405c
    • M
      perf probe: Fix a memory leak for scopes array · 8afa2a70
      Masami Hiramatsu 提交于
      Fix a memory leak for scopes array when it finds a variable in the
      global scope.
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110811110229.19900.63019.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8afa2a70
    • V
      perf: fix temporary file ownership check · e9b52ef2
      Vasiliy Kulikov 提交于
      A file in /tmp/ might be a symlink, so lstat() should be used instead of
      stat().
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110811205537.GA22864@albatrosSigned-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9b52ef2
  4. 11 8月, 2011 1 次提交
    • J
      perf report: Use properly build_id kernel binaries · f57b05ed
      Jiri Olsa 提交于
      If we bring the recorded perf data together with kernel binary from another
      machine using:
      
      	on server A:
      	perf archive
      
      	on server B:
      	tar xjvf perf.data.tar.bz2 -C ~/.debug
      
      the build_id kernel dso is not properly recognized during the "perf report"
      command on server B.
      
      The reason is, that build_id dsos are added during the session initialization,
      while the kernel maps are created during the sample event processing.
      
      The machine__create_kernel_maps functions ends up creating new dso object for
      kernel, but it does not check if we already have one added by build_id
      processing.
      
      Also the build_id reading ABI quirk added in commit:
      
       - commit b2511481
         perf build-id: Add quirk to deal with perf.data file format breakage
      
      populates the "struct build_id_event::pid" with 0, which
      is later interpreted as DEFAULT_GUEST_KERNEL_ID.
      
      This is not always correct, so it's better to guess the pid
      value based on the "struct build_id_event::header::misc" value.
      
      - Tested with data generated on x86 kernel version v2.6.34
        and reported back on x86_64 current kernel.
      - Not tested for guest kernel case.
      
      Note the problem stays for PERF_RECORD_MMAP events recorded by perf that
      does not use proper pid (HOST_KERNEL_ID/DEFAULT_GUEST_KERNEL_ID). They are
      misinterpreted within the current perf code. Probably there's not much we
      can do about that.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20110601194346.GB1934@jolsa.brq.redhat.comSigned-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f57b05ed
  5. 10 8月, 2011 2 次提交
  6. 09 8月, 2011 1 次提交
  7. 08 8月, 2011 3 次提交
  8. 26 7月, 2011 1 次提交
  9. 25 7月, 2011 1 次提交
  10. 22 7月, 2011 1 次提交
  11. 21 7月, 2011 4 次提交
  12. 16 7月, 2011 4 次提交