1. 09 8月, 2009 5 次提交
    • M
      perf top: Update man page · 83617983
      Mike Galbraith 提交于
      perf_counter tools: update perf top manual page to reflect
      current implementation.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83617983
    • M
      perf top: Improve interactive key handling · 091bd2e9
      Mike Galbraith 提交于
      Pressing any key which is not currently mapped to
      functionality, based on startup command line options, displays
      currently mapped keys, and prompts for input.
      
      Pressing any unmapped key at the prompt returns the user to
      display mode with variables unchanged.  eg, pressing ? <SPACE>
      <ESC> etc displays currently available keys, the value of the
      variable associated with that key, and prompts.
      
      Pressing same again aborts input.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      091bd2e9
    • M
      perf_counter tools: Allow perf top top users to switch between weighted and... · 46ab9764
      Mike Galbraith 提交于
      perf_counter tools: Allow perf top top users to switch between weighted and individual counter display
      
      Add [w]eighted hotkey.  Pressing [w] toggles between displaying
      weighted total of all counters, and the counter selected via
      [E]vent select key.
      
      ------------------------------------------------------------------------------
         PerfTop:   90395 irqs/sec  kernel:16.1% [cache-misses/cache-references/instructions],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
        weight     samples    pcnt         RIP          kernel function
        ______     _______   _____   ________________   _______________
      
      1275408.6      10881 -  5.3% - ffffffff81146f70 : copy_page_c
       553683.4      43569 - 21.3% - ffffffff81146f20 : clear_page_c
        74075.0       6768 -  3.3% - ffffffff81147190 : copy_user_generic_string
        40602.9       7538 -  3.7% - ffffffff81284ba2 : _spin_lock
        26882.1        965 -  0.5% - ffffffff8109d280 : file_ra_state_init
      
      [w]
      
      ------------------------------------------------------------------------------
         PerfTop:   91221 irqs/sec  kernel:14.5% [10000Hz cache-misses],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
        weight     samples    pcnt         RIP          kernel function
        ______     _______   _____   ________________   _______________
      
                  47320.00 - 22.3% - ffffffff81146f20 : clear_page_c
                  14261.00 -  6.7% - ffffffff810992f5 : __rmqueue
                  11046.00 -  5.2% - ffffffff81146f70 : copy_page_c
                   7842.00 -  3.7% - ffffffff81284ba2 : _spin_lock
                   7234.00 -  3.4% - ffffffff810aa1d6 : unmap_vmas
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46ab9764
    • M
      perf_counter tools: Fix/resurrect perf top annotation in a simple interactive form · 923c42c1
      Mike Galbraith 提交于
      perf top used to have annotation support, but it has bitrotted and
      removed.
      
      This patch restores that: it allows the user to select any symbol
      in kernel space for source level annotation on the fly, switch
      between event counters and alter display variables. When symbol
      details are being displayed, stopping annotation reverts to normal.
      
      known keys:
              [d]     select display delay.
              [e]     select display entries (lines).
              [E]     select annotation event counter.
              [f]     select normal display count filter.
              [F]     select annotation display count filter (percentage).
              [qQ]    quit.
              [s]     select annotation symbol and start annotation.
              [S]     stop annotation, revert to normal display.
              [z]     toggle event count zeroing.
      
      Sample:
      ------------------------------------------------------------------------------
         PerfTop:   16719 irqs/sec  kernel:78.7% [cache-misses/cache-references/instructions/cycles],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
      Showing cache-misses for e1000_clean_rx_irq
        Events  Pcnt (>=3%)
             0  0.0%                  /* adjust length to remove Ethernet CRC */
             0  0.0%                  if (!(adapter->flags2 & FLAG2_CRC_STRIPPING))
             0  0.0%                          length -= 4;
           436  5.0%      f039:       41 f6 84 24 5c 29 00    testb  $0x1,0x295c(%r12)
             0  0.0%      f089:       8b 4d 84                mov    -0x7c(%rbp),%ecx
             0  0.0%      f08c:       48 83 ef 02             sub    $0x2,%rdi
             0  0.0%      f090:       48 83 ee 02             sub    $0x2,%rsi
           811  9.3%      f094:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
             0  0.0%
             0  0.0%          while (rx_desc->status & E1000_RXD_STAT_DD) {
             0  0.0%      f114:       41 f6 47 0c 01          testb  $0x1,0xc(%r15)
          7226 82.6%      f119:       0f 85 24 fe ff ff       jne    ef43 <e1000_clean_rx_irq+0x84>
      
      Available events:
              0 cache-misses
              1 cache-references
              2 instructions
              3 cycles
      Enter details event counter: 2
      ------------------------------------------------------------------------------
         PerfTop:   15035 irqs/sec  kernel:79.0% [cache-misses/cache-references/instructions/cycles],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
      Showing instructions for e1000_clean_rx_irq
        Events  Pcnt (>=3%)
             0  0.0%                                 int *work_done, int work_to_do)
             0  0.0%  {
           175  0.9%      eebf:       55                      push   %rbp
          1898  9.8%      eec0:       48 89 e5                mov    %rsp,%rbp
             0  0.0%
             0  0.0%          i = rx_ring->next_to_clean;
           140  0.7%      ef0a:       0f b7 41 1a             movzwl 0x1a(%rcx),%eax
           670  3.4%      ef0e:       89 45 ac                mov    %eax,-0x54(%rbp)
             0  0.0%  {
             0  0.0%          memcpy(skb->data + offset, from, len);
            91  0.5%      f07b:       49 8b b6 e8 00 00 00    mov    0xe8(%r14),%rsi
          1153  5.9%      f082:       48 8b b8 e8 00 00 00    mov    0xe8(%rax),%rdi
            42  0.2%      f089:       8b 4d 84                mov    -0x7c(%rbp),%ecx
            14  0.1%      f08c:       48 83 ef 02             sub    $0x2,%rdi
             0  0.0%      f090:       48 83 ee 02             sub    $0x2,%rsi
          1618  8.3%      f094:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
             0  0.0%
             0  0.0%                  /* return some buffers to hardware, one at a time is too slow */
             0  0.0%                  if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
           867  4.5%      f0e7:       83 7d b0 0f             cmpl   $0xf,-0x50(%rbp)
             0  0.0%
             0  0.0%          while (rx_desc->status & E1000_RXD_STAT_DD) {
            37  0.2%      f114:       41 f6 47 0c 01          testb  $0x1,0xc(%r15)
          4047 20.8%      f119:       0f 85 24 fe ff ff       jne    ef43 <e1000_clean_rx_irq+0x84>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      923c42c1
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
  2. 07 8月, 2009 2 次提交
    • P
      perf: Auto-detect libelf · 9424edc2
      Peter Zijlstra 提交于
      Adds autodetection for libelf as well, and simplifies the
      libbfd code. Furthermore, fail make with an error when libelf
      is not found and warn about the lack of libbfd.
      
      Also provide an option to build a 32bit version even though you
      might be running a 64bit kernel.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9424edc2
    • A
      perf symbol: Fix symbol parsing in certain cases: use the build-id as a symlink · 4d1e00a8
      Arnaldo Carvalho de Melo 提交于
      In some cases distros have binaries and debuginfo in weird places:
      
      [root@doppio tuna]# ls -la /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
      -rwxr-xr-x 1 root root 90024 2009-08-03 19:45 /usr/lib64/firefox-3.5.2/firefox
      -rwxr-xr-x 1 root root 90024 2009-08-03 18:23 /usr/lib64/xulrunner-1.9.1/xulrunner-stub
      [root@doppio tuna]# sha1sum /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
      19a858077d263d5de22c9c5da250d3e4396ae739  /usr/lib64/xulrunner-1.9.1/xulrunner-stub
      19a858077d263d5de22c9c5da250d3e4396ae739  /usr/lib64/firefox-3.5.2/firefox
      [root@doppio tuna]# rpm -qf /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
      xulrunner-1.9.1.2-1.fc11.x86_64
      firefox-3.5.2-2.fc11.x86_64
      [root@doppio tuna]# ls -la /usr/lib/debug/{usr/lib64/xulrunner-1.9.1/xulrunner-stub,usr/lib64/firefox-3.5.2/firefox}.debug
      ls: cannot access /usr/lib/debug/usr/lib64/firefox-3.5.2/firefox.debug: No such file or directory
      -rwxr-xr-x 1 root root 403608 2009-08-03 18:22 /usr/lib/debug/usr/lib64/xulrunner-1.9.1/xulrunner-stub.debug
      
      Seemingly we don't have a .symtab when we actually can find it
      if we use the .note.gnu.build-id ELF section put in place by
      some distros. Use it and find the symbols we need.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4d1e00a8
  3. 05 8月, 2009 3 次提交
  4. 04 8月, 2009 1 次提交
  5. 02 8月, 2009 3 次提交
  6. 01 8月, 2009 1 次提交
    • I
      perf_counter tools: Fix link errors with older toolchains · 2d1b6949
      Ingo Molnar 提交于
      On older distros (F8 for example) the perf build could fail
      with such missing symbols:
      
          LINK perf
      /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../lib64/libbfd.a(bfd.o): In function `bfd_demangle':
      (.text+0x2b3): undefined reference to `cplus_demangle'
      /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../lib64/libbfd.a(bfd.o): In function `bfd_demangle':
      
      Link in -liberty too.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d1b6949
  7. 23 7月, 2009 8 次提交
    • M
      perf_counter tools: Give perf top inherit option · 0fdc7e67
      Mike Galbraith 提交于
      Currently, perf top -p only tracks the pid provided, which isn't very useful
      for watching forky loads, so give it an inherit option.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248165036.9795.10.camel@marge.simson.net>
      0fdc7e67
    • M
      perf_counter tools: Fix vmlinux symbol generation breakage · d20ff6bd
      Mike Galbraith 提交于
      vmlinux meets the criteria for symbol adjustment, which breaks vmlinux generated symbols.
      Fix this by exempting vmlinux.  This is a bit fragile in that someone could change the
      kernel dso's name, but currently that name is also hardwired.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248091298.18702.18.camel@marge.simson.net>
      d20ff6bd
    • J
      perf_counter: Detect debugfs location · 5beeded1
      Jason Baron 提交于
      If "/sys/kernel/debug" is not a debugfs mount point, search for the debugfs
      filesystem in /proc/mounts, but also allows the user to specify
      '--debugfs-dir=blah' or set the environment variable: 'PERF_DEBUGFS_DIR'
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      [ also made it probe "/debug" by default ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090721181629.GA3094@redhat.com>
      5beeded1
    • J
      perf_counter: Add tracepoint support to perf list, perf stat · f6bdafef
      Jason Baron 提交于
      Add support to 'perf list' and 'perf stat' for kernel tracepoints. The
      implementation creates a 'for_each_subsystem' and 'for_each_event' for
      easy iteration over the tracepoints.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <426129bf9fcc8ee63bb094cf736e7316a7dcd77a.1248190728.git.jbaron@redhat.com>
      f6bdafef
    • A
      perf symbol: C++ demangling · 28ac909b
      Arnaldo Carvalho de Melo 提交于
      [acme@doppio ~]$ perf report -s comm,dso,symbol -C firefox -d /usr/lib64/xulrunner-1.9.1/libxul.so | grep :: | head
           2.21%  [.] nsDeque::Push(void*)
           1.78%  [.] GraphWalker::DoWalk(nsDeque&)
           1.30%  [.] GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*)
           1.27%  [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
           1.18%  [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
           1.13%  [.] nsDeque::PopFront()
           1.11%  [.] nsGlobalWindow::RunTimeout(nsTimeout*)
           0.97%  [.] nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)
           0.95%  [.] nsJSEventListener::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
           0.95%  [.] nsCOMPtr_base::~nsCOMPtr_base()
      [acme@doppio ~]$
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Suggested-by: NClark Williams <williams@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090720171412.GB10410@ghostprotocols.net>
      28ac909b
    • A
      perf: avoid structure size confusion by using a fixed size · dfe5a504
      Arjan van de Ven 提交于
      for some reason, this structure gets compiled as 36 bytes in some files
      (the ones that alloacte it) but 40 bytes in others (the ones that use it).
      The cause is an off_t type that gets a different size in different
      compilation units for some yet-to-be-explained reason.
      
      But the effect is disasterous; the size/offset members of the struct
      are at different offsets, and result in mostly complete garbage.
      The parser in perf is so robust that this all gets hidden, and after
      skipping an certain amount of samples, it recovers.... so this bug
      is not normally noticed.
      
      .... except when you want every sample to be exact.
      
      Fix this by just using an explicitly sized type.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4A655917.9080504@linux.intel.com>
      dfe5a504
    • A
      perf_counter: Improve perf stat and perf record option parsing · a0541234
      Anton Blanchard 提交于
      perf stat and perf record currently look for all options on the command
      line. This can lead to some confusion:
      
      # perf stat ls -l
        Error: unknown switch `l'
      
      While we can work around this by adding '--' before the command, the git
      option parsing code can stop at the first non option:
      
      # perf stat ls -l
       Performance counter stats for 'ls -l':
      ....
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090722130412.GD9029@kryten>
      a0541234
    • P
      perf_counter: PERF_SAMPLE_ID and inherited counters · 7f453c24
      Peter Zijlstra 提交于
      Anton noted that for inherited counters the counter-id as provided by
      PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
      because each inherited counter gets its own id.
      
      His suggestion was to always return the parent counter id, since that
      is the primary counter id as exposed. However, these inherited
      counters have a unique identifier so that events like
      PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
      counter gets modified, which is important when trying to normalize the
      sample streams.
      
      This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
      which is more useful anyway, since changing periods became a lot more
      common than initially thought -- rendering PERF_EVENT_PERIOD the less
      useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
      value, since it reports the value used to trigger the overflow,
      whereas PERF_EVENT_PERIOD simply reports the requested period changed,
      which might only take effect on the next cycle).
      
      This still leaves us PERF_EVENT_THROTTLE to consider, but since that
      _should_ be a rare occurrence, and linking it to a primary id is the
      most useful bit to diagnose the problem, we introduce a
      PERF_SAMPLE_STREAM_ID, for those few cases where the full
      reconstruction is important.
      
      [Does change the ABI a little, but I see no other way out]
      Suggested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248095846.15751.8781.camel@twins>
      7f453c24
  8. 18 7月, 2009 3 次提交
  9. 13 7月, 2009 1 次提交
  10. 12 7月, 2009 5 次提交
    • A
      perf report: Introduce -n/--show-nr-samples · e3d7e183
      Arnaldo Carvalho de Melo 提交于
      [acme@doppio pahole]$ perf report -ns comm,dso,symbol -d /lib64/libc-2.10.1.so -C pahole | head -17
          21.94%      32101  [.] _int_malloc
          20.10%      29402  [.] __GI_strcmp
          16.77%      24533  [.] __tsearch
          12.61%      18450  [.] malloc_consolidate
           6.42%       9394  [.] _int_free
           6.28%       9191  [.] __tfind
           4.56%       6678  [.] __GI___libc_free
           4.46%       6520  [.] _IO_vfprintf_internal
           2.59%       3786  [.] __malloc
           1.17%       1716  [.] __GI_memcpy
      [acme@doppio pahole]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-5-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e3d7e183
    • A
      perf_counter tools: PLT info is stripped in -debuginfo packages · a25e46c4
      Arnaldo Carvalho de Melo 提交于
      So we need to get the richer .symtab from the debuginfo
      packages but the PLT info from the original DSO where we have
      just the leaner .dynsym symtab.
      
      Example:
      
      | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > before
      | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > after
      | [acme@doppio pahole]$ diff -U1 before after
      | --- before	2009-07-11 11:04:22.688595741 -0300
      | +++ after	2009-07-11 11:04:33.380595676 -0300
      | @@ -80,3 +80,2 @@
      |       0.07%  pahole ./build/pahole              [.] pahole_stealer
      | -     0.06%  pahole /usr/lib64/libdw-0.141.so   [.] 0x00000000007140
      |       0.06%  pahole /usr/lib64/libdw-0.141.so   [.] __libdw_getabbrev
      | @@ -91,2 +90,3 @@
      |       0.06%  pahole [kernel]                    [k] free_hot_cold_page
      | +     0.06%  pahole /usr/lib64/libdw-0.141.so   [.] tfind@plt
      |       0.05%  pahole ./build/libdwarves.so.1.0.0 [.] ftype__add_parameter
      | @@ -242,2 +242,3 @@
      |       0.01%  pahole [kernel]                    [k] account_group_user_time
      | +     0.01%  pahole /usr/lib64/libdw-0.141.so   [.] strlen@plt
      |       0.01%  pahole ./build/pahole              [.] strcmp@plt
      | [acme@doppio pahole]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-4-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a25e46c4
    • A
      perf report: Make the output more compact · 021191b3
      Arnaldo Carvalho de Melo 提交于
      When we filter by column content we may end up with a column
      that has the same value for all the lines. So remove that
      column and tell its unique value on the top, as a comment.
      
      Example:
      
        [acme@doppio pahole]$  perf report --sort comm,dso,symbol -d ./build/libdwarves.so.1.0.0 -C pahole | head -15
        # dso: ./build/libdwarves.so.1.0.0
        # comm: pahole
        # Samples: 58409
        #
        # Overhead  Symbol
        # ........  ......
        #
            20.93%  [.] tag__recode_dwarf_type
            14.94%  [.] namespace__recode_dwarf_types
            10.38%  [.] cu__table_add_tag
             6.69%  [.] __die__process_tag
             5.05%  [.] die__process_function
             4.70%  [.] list__for_all_tags
             3.68%  [.] tag__init
             3.48%  [.] die__create_new_parameter
        [acme@doppio pahole]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-3-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      021191b3
    • A
      strlist: Introduce strlist__entry and strlist__nr_entries methods · 27d0fd41
      Arnaldo Carvalho de Melo 提交于
      The strlist__entry method allows accessing strlists like an
      array, will be used in the 'perf report' to access the first
      entry.
      
      We now keep the nr_entries so that we can check if we have just
      one entry, will be used in 'perf report' to improve the output
      by showing just at the top when we have just, say, one DSO.
      
      While at it use nr_entries to optimize strlist__is_empty by not
      using the far more costly rb_first based implementation.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-2-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      27d0fd41
    • A
      perf report: Tidy up reporting of symbols not found · 60c1baf1
      Arnaldo Carvalho de Melo 提交于
      Always printing the level info about if it is in the kernel,
      hypervisor or userspace as that is in the hist_entry.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-1-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60c1baf1
  11. 11 7月, 2009 1 次提交
    • A
      perf report: Adjust column width to the values sampled · 52d422de
      Arnaldo Carvalho de Melo 提交于
      Auto-adjust column width of perf report output to the
      longest occuring string length.
      
      Example:
      
      [acme@doppio pahole]$  perf report --sort comm,dso,symbol | head -13
      
          12.79%   pahole  /usr/lib64/libdw-0.141.so    [.] __libdw_find_attr
           8.90%   pahole  /lib64/libc-2.10.1.so        [.] _int_malloc
           8.68%   pahole  /usr/lib64/libdw-0.141.so    [.] __libdw_form_val_len
           8.15%   pahole  /lib64/libc-2.10.1.so        [.] __GI_strcmp
           6.80%   pahole  /lib64/libc-2.10.1.so        [.] __tsearch
           5.54%   pahole  ./build/libdwarves.so.1.0.0  [.] tag__recode_dwarf_type
      [acme@doppio pahole]$
      
      [acme@doppio pahole]$  perf report --sort comm,dso,symbol -d /lib64/libc-2.10.1.so | head -10
      
          21.92%   pahole  /lib64/libc-2.10.1.so  [.] _int_malloc
          20.08%   pahole  /lib64/libc-2.10.1.so  [.] __GI_strcmp
          16.75%   pahole  /lib64/libc-2.10.1.so  [.] __tsearch
      [acme@doppio pahole]$
      
      Also add these extra options to control the new behaviour:
      
        -w, --field-width
      
      Force each column width to the provided list, for large terminal
      readability.
      
        -t, --field-separator:
      
      Use a special separator character and don't pad with spaces, replacing
      all occurances of this separator in symbol names (and other output) with
      a '.' character, that thus it's the only non valid separator.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090711014728.GH3452@ghostprotocols.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      52d422de
  12. 10 7月, 2009 2 次提交
    • V
      perf_counter: Add P6 PMU support · 11d1578f
      Vince Weaver 提交于
      Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
      enable/disable both its counters. We use this for the
      global enable/disable, and clear all config bits (except EN)
      to disable individual counters.
      
      Actual ia32 hardware doesn't support lfence, so use a locked
      op without side-effect to implement a full barrier.
      
      perf stat and perf record seem to function correctly.
      
      [a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]
      Signed-off-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      11d1578f
    • A
      perf_counter tools: Rename cache events to remove $ · 9590b7ba
      Anton Blanchard 提交于
      The cache events contain '$' which will hit shell variable
      expansion. To avoid confusion change this to 'cache', ie
      L1-d$-loads becomes L1-dcache-loads.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20090706120131.GB4391@kryten>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9590b7ba
  13. 05 7月, 2009 5 次提交
    • F
      perf report: Add "Fractal" mode output - support callchains with relative overhead rate · 805d127d
      Frederic Weisbecker 提交于
      The current callchain displays the overhead rates as absolute:
      relative to the total overhead.
      
      This patch provides relative overhead percentage, in which each
      branch of the callchain tree is a independant instrumentated object.
      
      This provides a 'fractal' view of the call-chain profile: each
      sub-graph looks like a profile in itself - relative to its parent.
      
      You can produce such output by using the "fractal" mode
      that you can abbreviate via f, fr, fra, frac, etc...
      
      ./perf report -s sym -c fractal
      
      Example:
      
           8.46%  [k] copy_user_generic_string
                      |
                      |--52.01%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |          |--97.20%-- sys_pread64
                      |          |          system_call_fastpath
                      |          |          pread64
                      |          |
                      |           --2.81%-- sys_read
                      |                     system_call_fastpath
                      |                     __read
                      |
                      |--39.85%-- generic_file_buffered_write
                      |          __generic_file_aio_write_nolock
                      |          generic_file_aio_write
                      |          do_sync_write
                      |          reiserfs_file_write
                      |          vfs_write
                      |          |
                      |          |--97.05%-- sys_pwrite64
                      |          |          system_call_fastpath
                      |          |          __pwrite64
                      |          |
                      |           --2.95%-- sys_write
                      |                     system_call_fastpath
                      |                     __write_nocancel
      [...]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      805d127d
    • F
      perf_counter tools: callchains: Manage the cumul hits on the fly · e05b876c
      Frederic Weisbecker 提交于
      The cumul hits are the number of hits of every childs of a node
      plus the hits of the current nodes, required for percentage
      computing of a branch.
      
      Theses numbers are calculated during the sorting of the branches of
      the callchain tree using a depth first postfix traversal, so that
      cumulative hits are propagated in the right order.
      
      But if we plan to implement percentages relative to the parent and not
      absolute percentages (relative to the whole overhead), we need to know
      the cumulative hits of the parent before computing the children
      because the relative minimum acceptable number of entries (ie: minimum
      rate against the cumulative hits from the parent) is the basis to
      filter the children against a given rate.
      
      Then we need to handle the cumul hits on the fly to prepare the
      implementation of relative overhead rates.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e05b876c
    • F
      perf report: Change default callchain parameters · 94a8eb02
      Frederic Weisbecker 提交于
      The default callchain parameters are set to use the flat mode and never
      filter any overhead threshold of backtrace.
      
      But flat mode is boring compared to graph mode.
      Also the number of callchains may be very high if none is
      filtered.
      
      Let's change this to set the graph view and a minimum overhead of 0.5%
      as default parameters.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94a8eb02
    • F
      perf report: Use a modifiable string for default callchain options · be903885
      Frederic Weisbecker 提交于
      If the user doesn't provide options to tune his callchain output
      (ie: if he uses -c without arguments) then the default value passed
      in the OPT_CALLBACK_DEFAULT() macro is used.
      
      But it's parsed later by strtok() which will replace comma separators
      to a zero. This may segfault as we are using a read-only string.
      
      Use a modifiable one instead, and also fix the "100%" default
      minimum threshold value by turning it into a 0 (output every callchains)
      as it was intended in the origin.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      be903885
    • F
      perf report: Warn on callchain output request from non-callchain file · 91b4eaea
      Frederic Weisbecker 提交于
      perf report segfaults while trying to handle callchains from a non
      callchain data file.
      
      Instead of a segfault, print a useful message to the user.
      Reported-by: NJens Axboe <jens.axboe@oracle.com>
      Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      91b4eaea