1. 09 8月, 2009 19 次提交
    • B
      perf report: Fix and improve the displaying of per-thread event counters · 8d513270
      Brice Goglin 提交于
      Improve and fix the handling of per-thread counter stats
      recorded via perf record -s. Previously we only displayed
      it in debug printouts (-D) and even that output was hard
      to disambiguate.
      
      I moved everything to utils/values.[ch] so that we may reuse
      it in perf stat.
      
      We get something like this now:
      
       #  PID   TID  cache-misses  cache-references
         4658  4659        495581           3238779
         4658  4662        498246           3236823
         4658  4663        499531           3243162
      
      Then it'll be easy to add --pretty=raw to display a single line per thread/event.
      
      By the way, -S was also used for --symbol... So I used -T/--thread here.
      
      perf report: Add -T/--threads to display per-thread counter values
      
       We get something like this now:
       #  PID   TID  cache-misses  cache-references
         4658  4659        495581           3238779
         4658  4662        498246           3236823
         4658  4663        499531           3243162
      
      Per-thread arrays of counter values are managed in utils/values.[ch]
      Signed-off-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d513270
    • M
      perf_counter tools: Fix libbfd detection for systems with libz dependency · 183f3b08
      Mike Galbraith 提交于
      Due to a libz dependency in some distro's binutils package,
      C++ demangle support isn't compiled in despite the necessary
      libraries being available.
      
      Fix this by adding a -lz link test to the dependency detection
      rules.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1249733655.6929.5.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      183f3b08
    • C
      perf: "Longum est iter per praecepta, breve et efficax per exempla" · c24b5133
      Carlos R. Mafra 提交于
      A few examples of how 'perf' can be used, from an e-mail by
      Ingo Molnar http://lkml.org/lkml/2009/8/4/346.
      Signed-off-by: NCarlos R. Mafra <crmafra2@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Valdis.Kletnieks@vt.edu
      LKML-Reference: <20090805185334.GA4535@Pilar.aei.mpg.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c24b5133
    • F
      perf tools: callchain: Fix sum of percentages to be 100% by displaying amount... · 25446036
      Frederic Weisbecker 提交于
      perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
      
      When we filter the callchains below a given percentage, we
      ignore them and the end result only shows entries that have an
      upper percentage than the filter threshold.
      
      It seems to users then that we have an imbalance in the
      percentage, as if the sum inside a profiled branch doesn't
      reach 100%.
      
      Since in the past there have been real perf report bugs that
      showed the same sypmtom, it would be nice to assure the user
      that the data is perfect and trustable and it all sums up to
      100.00%.
      
      So fix this by displaying the remaining hits that have been
      filtered but without more detail than their amount in each
      branches. Example while filtering below 50%:
      
      7.73%  [k] delay_tsc
                      |
                      |--98.22%-- __const_udelay
                      |          |
                      |          |--86.37%-- ath5k_hw_register_timeout
                      |          |          ath5k_hw_noise_floor_calibration
                      |          |          ath5k_hw_reset
                      |          |          ath5k_reset
                      |          |          ath5k_config
                      |          |          ieee80211_hw_config
                      |          |          |
                      |          |          |--88.53%-- ieee80211_scan_work
                      |          |          |          worker_thread
                      |          |          |          kthread
                      |          |          |          child_rip
                      |          |           --11.47%-- [...]
                      |           --13.63%-- [...]
                       --1.78%-- [...]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1249690585-9145-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      25446036
    • F
      perf tools: callchain: Fix 'perf report' display to be callchain by default · b1a88349
      Frederic Weisbecker 提交于
      If we recorded with -g option to record the callchain, right now
      we require a -g option to perf report as well - and people reported
      this as unnecessary complication: the user already specified -g
      once, no need to require it a second time.
      
      So if the recording includes call-chains, display the callchain by
      default from perf report.
      
      ( The user can override this default using "-g none" option from
        perf report. )
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1249690585-9145-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b1a88349
    • F
      perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains · b0efe213
      Frederic Weisbecker 提交于
      When the callchain tree comes to insert an empty backtrace, it
      raises a spurious warning about the fact we are inserting an
      empty. This is spurious because the radix tree assumes it did
      something wrong to reach this state. But it didn't, we just met
      an empty callchain that has to be ignored.
      
      This happens occasionally with certain types of call-chain
      recordings. If it happens it's a big nuisance as perf report
      output starts with thousands of warning lines.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1249690585-9145-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b0efe213
    • P
      perf record: Fix the -A UI for empty or non-existent perf.data · 266e0e21
      Pierre Habouzit 提交于
      1. Ignore the -A argument if there is no perf.data file
      2. Treat an empty file like a non existent file.
      
      Else, perf will try to read the perf.data header, and fail with
      an error.
      
      Treating an empty file like a non-existent file makes sense,
      since an interupted (as in SIGKILLed) perf could leave such
      files around, and you don't want to annoy the user with errors
      for files with no data in it.
      Signed-off-by: NPierre Habouzit <pierre.habouzit@intersec.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      266e0e21
    • P
      perf util: Fix do_read() to fail on EOF instead of busy-looping · 7eac7e9e
      Pierre Habouzit 提交于
      While toying with perf, I've noticed that perf record can
      easily enter a busy loop when doing something as silly as:
      
          $ perf record -A ls
      
      Yeah, do_read here really wants to read a known size, not being
      able to should die(), not busy-loop ;)
      
      That was the cause for the bug.
      Signed-off-by: NPierre Habouzit <pierre.habouzit@intersec.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7eac7e9e
    • P
      perf list: Fix the output to not include tracepoints without an id · ae07b63f
      Peter Zijlstra 提交于
      Stop perf list from displaying tracepoints without an id file,
      those are special tracepoints that are not interfaced to
      perfcounters so listing them is erroneous and passing them as
      events will produce no output.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ae07b63f
    • B
      perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale · b26bc5a7
      Brice Goglin 提交于
      We want to use a coherent flag for -S/--stat across all tools,
      so free up -S in perf stat.
      Signed-off-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b26bc5a7
    • A
      perf report: Add debug help for the finding of symbol bugs - show the symtab... · 94cb9e38
      Arnaldo Carvalho de Melo 提交于
      perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
      
      Used with perf report --verbose:
      
      [acme@doppio linux-2.6-tip]$ perf report -v | head -16
           5.17%  firefox  /usr/lib64/xulrunner-1.9.1/libxul.so   0x00000000005d8eee f [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
           2.56%  firefox  /lib64/libpthread-2.10.1.so            0x0000000000008e02 d [.] __pthread_mutex_lock_internal
           1.94%  firefox  /usr/lib64/xulrunner-1.9.1/libxul.so   0x0000000000d0af8f f [.] SearchTable
           1.75%  firefox  [kernel]                               0xffffffffff60013b k [.] vread_hpet
           1.63%  firefox  /lib64/libpthread-2.10.1.so            0x000000000000a404 d [.] __pthread_mutex_unlock
           1.47%  firefox  /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000482ea f [.] js_Interpret
           1.42%  firefox  /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x000000000003eda3 f [.] JS_CallTracer
           1.24%  firefox  [kernel]                               0xffffffff8102ca4a k [k] read_hpet
           1.16%  firefox  [kernel]                               0xffffffff810f3dd4 k [k] fget_light
           1.11%  firefox  /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000567ff f [.] js_TraceObject
           0.98%  firefox  /usr/lib64/firefox-3.5.2/firefox       0x000000000000dd23 b [.] arena_ralloc
      [acme@doppio linux-2.6-tip]$
      
      The new field is just after the symbol address. To help in
      figuring out symbol resolution bugs.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94cb9e38
    • P
      perf report: Fix per task mult-counter stat reporting · 8f18aec5
      Peter Zijlstra 提交于
      Brice Goglin reported:
      
      > I can easily sort them by thread id, but I don't know how to match
      > my 4 events with each group of 4 lines.
      
      Also report the counter id and the time running/enabled
      stats (in case the counter got time-shared).
      Reported-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f18aec5
    • P
      perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header · 1c222bce
      Peter Zijlstra 提交于
      Brice Goglin reported that only the first result from a
      multi-counter perf record --stat run is accurate, the
      rest looks bogus.
      
      A silly mistake made us re-read the first attribute for
      every recorded attribute.
      Reported-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Cc: paulus@samba.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1c222bce
    • F
      perf tools: Fix call-chain cumul hit based sub-total (fractal mode) · 1953287b
      Frederic Weisbecker 提交于
      The callchain fractal mode builds each new total hits in a new
      branch of profiling by using the parent's hits of the current
      branch plus the hits of the children.
      
      This is wrong, the total hits of a branch should be made of the
      sum of every children hits, we must ignore the parent hits in
      this scope.
      
      This patch also fixes another mistake with the hit counting.
      
      Now the rates are correct.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1953287b
    • M
      perf top: Update man page · 83617983
      Mike Galbraith 提交于
      perf_counter tools: update perf top manual page to reflect
      current implementation.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83617983
    • M
      perf top: Improve interactive key handling · 091bd2e9
      Mike Galbraith 提交于
      Pressing any key which is not currently mapped to
      functionality, based on startup command line options, displays
      currently mapped keys, and prompts for input.
      
      Pressing any unmapped key at the prompt returns the user to
      display mode with variables unchanged.  eg, pressing ? <SPACE>
      <ESC> etc displays currently available keys, the value of the
      variable associated with that key, and prompts.
      
      Pressing same again aborts input.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      091bd2e9
    • M
      perf_counter tools: Allow perf top top users to switch between weighted and... · 46ab9764
      Mike Galbraith 提交于
      perf_counter tools: Allow perf top top users to switch between weighted and individual counter display
      
      Add [w]eighted hotkey.  Pressing [w] toggles between displaying
      weighted total of all counters, and the counter selected via
      [E]vent select key.
      
      ------------------------------------------------------------------------------
         PerfTop:   90395 irqs/sec  kernel:16.1% [cache-misses/cache-references/instructions],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
        weight     samples    pcnt         RIP          kernel function
        ______     _______   _____   ________________   _______________
      
      1275408.6      10881 -  5.3% - ffffffff81146f70 : copy_page_c
       553683.4      43569 - 21.3% - ffffffff81146f20 : clear_page_c
        74075.0       6768 -  3.3% - ffffffff81147190 : copy_user_generic_string
        40602.9       7538 -  3.7% - ffffffff81284ba2 : _spin_lock
        26882.1        965 -  0.5% - ffffffff8109d280 : file_ra_state_init
      
      [w]
      
      ------------------------------------------------------------------------------
         PerfTop:   91221 irqs/sec  kernel:14.5% [10000Hz cache-misses],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
        weight     samples    pcnt         RIP          kernel function
        ______     _______   _____   ________________   _______________
      
                  47320.00 - 22.3% - ffffffff81146f20 : clear_page_c
                  14261.00 -  6.7% - ffffffff810992f5 : __rmqueue
                  11046.00 -  5.2% - ffffffff81146f70 : copy_page_c
                   7842.00 -  3.7% - ffffffff81284ba2 : _spin_lock
                   7234.00 -  3.4% - ffffffff810aa1d6 : unmap_vmas
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46ab9764
    • M
      perf_counter tools: Fix/resurrect perf top annotation in a simple interactive form · 923c42c1
      Mike Galbraith 提交于
      perf top used to have annotation support, but it has bitrotted and
      removed.
      
      This patch restores that: it allows the user to select any symbol
      in kernel space for source level annotation on the fly, switch
      between event counters and alter display variables. When symbol
      details are being displayed, stopping annotation reverts to normal.
      
      known keys:
              [d]     select display delay.
              [e]     select display entries (lines).
              [E]     select annotation event counter.
              [f]     select normal display count filter.
              [F]     select annotation display count filter (percentage).
              [qQ]    quit.
              [s]     select annotation symbol and start annotation.
              [S]     stop annotation, revert to normal display.
              [z]     toggle event count zeroing.
      
      Sample:
      ------------------------------------------------------------------------------
         PerfTop:   16719 irqs/sec  kernel:78.7% [cache-misses/cache-references/instructions/cycles],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
      Showing cache-misses for e1000_clean_rx_irq
        Events  Pcnt (>=3%)
             0  0.0%                  /* adjust length to remove Ethernet CRC */
             0  0.0%                  if (!(adapter->flags2 & FLAG2_CRC_STRIPPING))
             0  0.0%                          length -= 4;
           436  5.0%      f039:       41 f6 84 24 5c 29 00    testb  $0x1,0x295c(%r12)
             0  0.0%      f089:       8b 4d 84                mov    -0x7c(%rbp),%ecx
             0  0.0%      f08c:       48 83 ef 02             sub    $0x2,%rdi
             0  0.0%      f090:       48 83 ee 02             sub    $0x2,%rsi
           811  9.3%      f094:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
             0  0.0%
             0  0.0%          while (rx_desc->status & E1000_RXD_STAT_DD) {
             0  0.0%      f114:       41 f6 47 0c 01          testb  $0x1,0xc(%r15)
          7226 82.6%      f119:       0f 85 24 fe ff ff       jne    ef43 <e1000_clean_rx_irq+0x84>
      
      Available events:
              0 cache-misses
              1 cache-references
              2 instructions
              3 cycles
      Enter details event counter: 2
      ------------------------------------------------------------------------------
         PerfTop:   15035 irqs/sec  kernel:79.0% [cache-misses/cache-references/instructions/cycles],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
      Showing instructions for e1000_clean_rx_irq
        Events  Pcnt (>=3%)
             0  0.0%                                 int *work_done, int work_to_do)
             0  0.0%  {
           175  0.9%      eebf:       55                      push   %rbp
          1898  9.8%      eec0:       48 89 e5                mov    %rsp,%rbp
             0  0.0%
             0  0.0%          i = rx_ring->next_to_clean;
           140  0.7%      ef0a:       0f b7 41 1a             movzwl 0x1a(%rcx),%eax
           670  3.4%      ef0e:       89 45 ac                mov    %eax,-0x54(%rbp)
             0  0.0%  {
             0  0.0%          memcpy(skb->data + offset, from, len);
            91  0.5%      f07b:       49 8b b6 e8 00 00 00    mov    0xe8(%r14),%rsi
          1153  5.9%      f082:       48 8b b8 e8 00 00 00    mov    0xe8(%rax),%rdi
            42  0.2%      f089:       8b 4d 84                mov    -0x7c(%rbp),%ecx
            14  0.1%      f08c:       48 83 ef 02             sub    $0x2,%rdi
             0  0.0%      f090:       48 83 ee 02             sub    $0x2,%rsi
          1618  8.3%      f094:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
             0  0.0%
             0  0.0%                  /* return some buffers to hardware, one at a time is too slow */
             0  0.0%                  if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
           867  4.5%      f0e7:       83 7d b0 0f             cmpl   $0xf,-0x50(%rbp)
             0  0.0%
             0  0.0%          while (rx_desc->status & E1000_RXD_STAT_DD) {
            37  0.2%      f114:       41 f6 47 0c 01          testb  $0x1,0xc(%r15)
          4047 20.8%      f119:       0f 85 24 fe ff ff       jne    ef43 <e1000_clean_rx_irq+0x84>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      923c42c1
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
  2. 07 8月, 2009 2 次提交
    • P
      perf: Auto-detect libelf · 9424edc2
      Peter Zijlstra 提交于
      Adds autodetection for libelf as well, and simplifies the
      libbfd code. Furthermore, fail make with an error when libelf
      is not found and warn about the lack of libbfd.
      
      Also provide an option to build a 32bit version even though you
      might be running a 64bit kernel.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9424edc2
    • A
      perf symbol: Fix symbol parsing in certain cases: use the build-id as a symlink · 4d1e00a8
      Arnaldo Carvalho de Melo 提交于
      In some cases distros have binaries and debuginfo in weird places:
      
      [root@doppio tuna]# ls -la /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
      -rwxr-xr-x 1 root root 90024 2009-08-03 19:45 /usr/lib64/firefox-3.5.2/firefox
      -rwxr-xr-x 1 root root 90024 2009-08-03 18:23 /usr/lib64/xulrunner-1.9.1/xulrunner-stub
      [root@doppio tuna]# sha1sum /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
      19a858077d263d5de22c9c5da250d3e4396ae739  /usr/lib64/xulrunner-1.9.1/xulrunner-stub
      19a858077d263d5de22c9c5da250d3e4396ae739  /usr/lib64/firefox-3.5.2/firefox
      [root@doppio tuna]# rpm -qf /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
      xulrunner-1.9.1.2-1.fc11.x86_64
      firefox-3.5.2-2.fc11.x86_64
      [root@doppio tuna]# ls -la /usr/lib/debug/{usr/lib64/xulrunner-1.9.1/xulrunner-stub,usr/lib64/firefox-3.5.2/firefox}.debug
      ls: cannot access /usr/lib/debug/usr/lib64/firefox-3.5.2/firefox.debug: No such file or directory
      -rwxr-xr-x 1 root root 403608 2009-08-03 18:22 /usr/lib/debug/usr/lib64/xulrunner-1.9.1/xulrunner-stub.debug
      
      Seemingly we don't have a .symtab when we actually can find it
      if we use the .note.gnu.build-id ELF section put in place by
      some distros. Use it and find the symbols we need.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4d1e00a8
  3. 05 8月, 2009 3 次提交
  4. 04 8月, 2009 1 次提交
  5. 02 8月, 2009 3 次提交
  6. 01 8月, 2009 1 次提交
    • I
      perf_counter tools: Fix link errors with older toolchains · 2d1b6949
      Ingo Molnar 提交于
      On older distros (F8 for example) the perf build could fail
      with such missing symbols:
      
          LINK perf
      /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../lib64/libbfd.a(bfd.o): In function `bfd_demangle':
      (.text+0x2b3): undefined reference to `cplus_demangle'
      /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../lib64/libbfd.a(bfd.o): In function `bfd_demangle':
      
      Link in -liberty too.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d1b6949
  7. 23 7月, 2009 8 次提交
    • M
      perf_counter tools: Give perf top inherit option · 0fdc7e67
      Mike Galbraith 提交于
      Currently, perf top -p only tracks the pid provided, which isn't very useful
      for watching forky loads, so give it an inherit option.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248165036.9795.10.camel@marge.simson.net>
      0fdc7e67
    • M
      perf_counter tools: Fix vmlinux symbol generation breakage · d20ff6bd
      Mike Galbraith 提交于
      vmlinux meets the criteria for symbol adjustment, which breaks vmlinux generated symbols.
      Fix this by exempting vmlinux.  This is a bit fragile in that someone could change the
      kernel dso's name, but currently that name is also hardwired.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248091298.18702.18.camel@marge.simson.net>
      d20ff6bd
    • J
      perf_counter: Detect debugfs location · 5beeded1
      Jason Baron 提交于
      If "/sys/kernel/debug" is not a debugfs mount point, search for the debugfs
      filesystem in /proc/mounts, but also allows the user to specify
      '--debugfs-dir=blah' or set the environment variable: 'PERF_DEBUGFS_DIR'
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      [ also made it probe "/debug" by default ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090721181629.GA3094@redhat.com>
      5beeded1
    • J
      perf_counter: Add tracepoint support to perf list, perf stat · f6bdafef
      Jason Baron 提交于
      Add support to 'perf list' and 'perf stat' for kernel tracepoints. The
      implementation creates a 'for_each_subsystem' and 'for_each_event' for
      easy iteration over the tracepoints.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <426129bf9fcc8ee63bb094cf736e7316a7dcd77a.1248190728.git.jbaron@redhat.com>
      f6bdafef
    • A
      perf symbol: C++ demangling · 28ac909b
      Arnaldo Carvalho de Melo 提交于
      [acme@doppio ~]$ perf report -s comm,dso,symbol -C firefox -d /usr/lib64/xulrunner-1.9.1/libxul.so | grep :: | head
           2.21%  [.] nsDeque::Push(void*)
           1.78%  [.] GraphWalker::DoWalk(nsDeque&)
           1.30%  [.] GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*)
           1.27%  [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
           1.18%  [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
           1.13%  [.] nsDeque::PopFront()
           1.11%  [.] nsGlobalWindow::RunTimeout(nsTimeout*)
           0.97%  [.] nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)
           0.95%  [.] nsJSEventListener::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
           0.95%  [.] nsCOMPtr_base::~nsCOMPtr_base()
      [acme@doppio ~]$
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Suggested-by: NClark Williams <williams@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090720171412.GB10410@ghostprotocols.net>
      28ac909b
    • A
      perf: avoid structure size confusion by using a fixed size · dfe5a504
      Arjan van de Ven 提交于
      for some reason, this structure gets compiled as 36 bytes in some files
      (the ones that alloacte it) but 40 bytes in others (the ones that use it).
      The cause is an off_t type that gets a different size in different
      compilation units for some yet-to-be-explained reason.
      
      But the effect is disasterous; the size/offset members of the struct
      are at different offsets, and result in mostly complete garbage.
      The parser in perf is so robust that this all gets hidden, and after
      skipping an certain amount of samples, it recovers.... so this bug
      is not normally noticed.
      
      .... except when you want every sample to be exact.
      
      Fix this by just using an explicitly sized type.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4A655917.9080504@linux.intel.com>
      dfe5a504
    • A
      perf_counter: Improve perf stat and perf record option parsing · a0541234
      Anton Blanchard 提交于
      perf stat and perf record currently look for all options on the command
      line. This can lead to some confusion:
      
      # perf stat ls -l
        Error: unknown switch `l'
      
      While we can work around this by adding '--' before the command, the git
      option parsing code can stop at the first non option:
      
      # perf stat ls -l
       Performance counter stats for 'ls -l':
      ....
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090722130412.GD9029@kryten>
      a0541234
    • P
      perf_counter: PERF_SAMPLE_ID and inherited counters · 7f453c24
      Peter Zijlstra 提交于
      Anton noted that for inherited counters the counter-id as provided by
      PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
      because each inherited counter gets its own id.
      
      His suggestion was to always return the parent counter id, since that
      is the primary counter id as exposed. However, these inherited
      counters have a unique identifier so that events like
      PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
      counter gets modified, which is important when trying to normalize the
      sample streams.
      
      This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
      which is more useful anyway, since changing periods became a lot more
      common than initially thought -- rendering PERF_EVENT_PERIOD the less
      useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
      value, since it reports the value used to trigger the overflow,
      whereas PERF_EVENT_PERIOD simply reports the requested period changed,
      which might only take effect on the next cycle).
      
      This still leaves us PERF_EVENT_THROTTLE to consider, but since that
      _should_ be a rare occurrence, and linking it to a primary id is the
      most useful bit to diagnose the problem, we introduce a
      PERF_SAMPLE_STREAM_ID, for those few cases where the full
      reconstruction is important.
      
      [Does change the ABI a little, but I see no other way out]
      Suggested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248095846.15751.8781.camel@twins>
      7f453c24
  8. 18 7月, 2009 3 次提交