1. 20 6月, 2009 1 次提交
    • P
      perf_counter tools: Define and use our own u64, s64 etc. definitions · 9cffa8d5
      Paul Mackerras 提交于
      On 64-bit powerpc, __u64 is defined to be unsigned long rather than
      unsigned long long.  This causes compiler warnings every time we
      print a __u64 value with %Lx.
      
      Rather than changing __u64, we define our own u64 to be unsigned long
      long on all architectures, and similarly s64 as signed long long.
      For consistency we also define u32, s32, u16, s16, u8 and s8.  These
      definitions are put in a new header, types.h, because these definitions
      are needed in util/string.h and util/symbol.h.
      
      The main change here is the mechanical change of __[us]{64,32,16,8}
      to remove the "__".  The other changes are:
      
      * Create types.h
      * Include types.h in perf.h, util/string.h and util/symbol.h
      * Add types.h to the LIB_H definition in Makefile
      * Added (u64) casts in process_overflow_event() and print_sym_table()
        to kill two remaining warnings.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: benh@kernel.crashing.org
      LKML-Reference: <19003.33494.495844.956580@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9cffa8d5
  2. 19 6月, 2009 2 次提交
  3. 18 6月, 2009 8 次提交
    • I
      perf report: Filter to parent set by default · b8e6d829
      Ingo Molnar 提交于
      Make it easier to use parent filtering - default to a filtered
      output. Also add the parent column so that we get collapsing but
      dont display it by default.
      
      add --no-exclude-other to override this.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8e6d829
    • P
      perf_counter tools: Handle lost events · 9d91a6f7
      Peter Zijlstra 提交于
      Make use of the new ->data_tail mechanism to tell kernel-space
      about user-space draining the data stream. Emit lost events
      (and display them) if they happen.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d91a6f7
    • P
      perf_counter: tools: Makefile tweaks for 64-bit powerpc · e24a72c4
      Paul Mackerras 提交于
      On 64-bit powerpc, perf needs to be built as a 64-bit executable.
      This arranges to add the -m64 flag to CFLAGS if we are running on
      a 64-bit machine, indicated by the result of uname -m ending in "64".
      This means that we'll use -m64 on x86_64 machines as well.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linuxppc-dev@ozlabs.org
      Cc: benh@kernel.crashing.org
      LKML-Reference: <19000.55666.866148.559620@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e24a72c4
    • P
      perf_counter tools: Add and use isprint() · a73c7d84
      Peter Zijlstra 提交于
      Introduce isprint() to print out raw event dumps to ASCII, etc.
      
      (This is an extension to upstream Git's ctype.c.)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      [ removed openssl.h inclusion from util.h - it leaked ctype.h ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a73c7d84
    • I
      perf report: Add validation of call-chain entries · 7522060c
      Ingo Molnar 提交于
      Add boundary checks for call-chain events. In case of corrupted
      entries we could crash otherwise.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7522060c
    • I
      perf report: Tidy up the "--parent <regex>" and "--sort parent" call-chain features · b25bcf2f
      Ingo Molnar 提交于
      Instead of the ambigious 'call' naming use the much more
      specific 'parent' naming:
      
       - rename --call <regex> to --parent <regex>
      
       - rename --sort call to --sort parent
      
       - rename [unmatched] to [other] - to signal that this is not
         an error but the inverse set
      
      Also add pagefaults to the default parent-symbol pattern too,
      as it's a 'syscall overhead category' in a sense.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b25bcf2f
    • P
      perf_counter tools: Replace isprint() with issane() · 5aa75a0f
      Peter Zijlstra 提交于
      The Git utils came with a ctype replacement that doesn't provide
      isprint(). Add a replacement.
      
      Solves a build bug on certain distros.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5aa75a0f
    • P
      perf report: Add --sort <call> --call <$regex> · 6e7d6fdc
      Peter Zijlstra 提交于
      Implement sorting by callchain symbols, --sort <call>.
      
      It will create a new column which will show a match to
      --call $regex or "[unmatched]".
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e7d6fdc
  4. 15 6月, 2009 4 次提交
    • I
      perf report: Fix 32-bit printf format · e2eae0f5
      Ingo Molnar 提交于
      Yong Wang reported the following compiler warning:
      
       builtin-report.c: In function 'process_overflow_event':
       builtin-report.c:984: error: cast to pointer from integer of different size
      
      Which happens because we try to print ->ips[] out with a limited
      format, losing the high 32 bits. Print it out using %016Lx instead.
      Reported-by: NYong Wang <yong.y.wang@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e2eae0f5
    • I
      perf report: Add per system call overhead histogram · 3dfabc74
      Ingo Molnar 提交于
      Take advantage of call-graph percounter sampling/recording to
      display a non-trivial histogram: the true, collapsed/summarized
      cost measurement, on a per system call total overhead basis:
      
       aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10
       aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10
       #
       # (3536 samples)
       #
       # Overhead  Symbol
       # ........  ......
       #
           40.75%  [k] sys_write
           40.21%  [k] sys_read
            4.44%  [k] do_nmi
       ...
      
      This is done by accounting each (reliable) call-chain that chains back
      to a given system call to that system call function.
      
      [ So in the above example we can see that hackbench spends about 40% of
        its total time somewhere in sys_write() and 40% somewhere in
        sys_read(), the rest of the time is spent in user-space. The time
        is not spent in sys_write() _itself_ but in one of its many child
        functions. ]
      
      Or, a recording of a (source files are already in the page-cache) kernel build:
      
       $ perf record -g -m 512 -f -- make -j32 kernel
       $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi
      
           4.14%  [k] do_page_fault
           1.20%  [k] sys_write
           1.10%  [k] sys_open
           0.63%  [k] sys_exit_group
           0.48%  [k] smp_apic_timer_interrupt
           0.37%  [k] sys_read
           0.37%  [k] sys_execve
           0.20%  [k] sys_mmap
           0.18%  [k] sys_close
           0.14%  [k] sys_munmap
           0.13%  [k] sys_poll
           0.09%  [k] sys_newstat
           0.07%  [k] sys_clone
           0.06%  [k] sys_newfstat
           0.05%  [k] sys_access
           0.05%  [k] schedule
      
      Shows the true total cost of each syscall variant that gets used
      during a kernel build. This profile reveals it that pagefaults are
      the costliest, followed by read()/write().
      
      An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds
      per 100 seconds of kernel build-time. (this was done with HZ=1000)
      
      The summary is done in 'perf report', i.e. in the post-processing
      stage - so once we have a good call-graph recording, this type of
      non-trivial high-level analysis becomes possible.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3dfabc74
    • I
      perf record: Fix fast task-exit race · 613d8602
      Ingo Molnar 提交于
      Recording with -a (or with -p) can race with tasks going away:
      
         couldn't open /proc/8440/maps
      
      Causing an early exit() and no recording done.
      
      Do not abort the recording session - instead just skip that task.
      
      Also, only print the warnings under -v.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      613d8602
    • I
      perf record/report: Add call graph / call chain profiling · 3efa1cc9
      Ingo Molnar 提交于
      Add the first steps of call-graph profiling:
      
       - add the -c (--call-graph) option to perf record
       - parse the call-graph record and printout out under -D (--dump-trace)
      
      The call-graph data is not put into the histogram yet, but it
      can be seen that it's being processed correctly:
      
      0x3ce0 [0x38]: event: 35
      .
      . ... raw event: size 56 bytes
      .  0000:  23 00 00 00 05 00 38 00 d4 df 0e 81 ff ff ff ff  #.....8........
      .  0010:  60 0b 00 00 60 0b 00 00 03 00 00 00 01 00 02 00  `...`..........
      .  0020:  d4 df 0e 81 ff ff ff ff a0 61 ed 41 36 00 00 00  .........a.A6..
      .  0030:  04 92 e6 41 36 00 00 00                          .a.A6..
      .
      0x3ce0 [0x38]: PERF_EVENT (IP, 5): 2912: 0xffffffff810edfd4 period: 1
      ... chain: u:2, k:1, nr:3
      .....  0: 0xffffffff810edfd4
      .....  1: 0x3641ed61a0
      .....  2: 0x3641e69204
       ... thread: perf:2912
       ...... dso: [kernel]
      
      This shows a 3-entry call-graph: with 1 kernel-space and two user-space
      entries
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3efa1cc9
  5. 14 6月, 2009 1 次提交
    • I
      perf report: Print out raw events in hexa · 8465b050
      Ingo Molnar 提交于
      Print out events in hexa dump format, when -D is specified:
      
      0x4868 [0x48]: event: 1
      .
      . ... raw event: size 72 bytes
      .  0000:  01 00 00 00 00 00 48 00 d4 72 00 00 d4 72 00 00  ......H..r...r.
      .  0010:  00 00 40 f2 3e 00 00 00 00 30 01 00 00 00 00 00  ..@.>....0.....
      .  0020:  00 00 00 00 00 00 00 00 2f 75 73 72 2f 6c 69 62  ......../usr/li
      .  0030:  36 34 2f 6c 69 62 65 6c 66 2d 30 2e 31 34 31 2e  64/libelf-0.141
      .  0040:  73 6f 00 00 00 00 00 00                          f-0.141
      .
      0x4868 [0x48]: PERF_EVENT_MMAP 29396: [0x3ef2400000(0x13000) @ (nil)]: /usr/lib64/libelf-0.141.so
      
      This helps the debugging of mis-parsing of data files, and helps
      the addition of new sample/trace formats.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8465b050
  6. 13 6月, 2009 7 次提交
    • F
      perf annotate: Fixes for filename:line displays · c17c2db1
      Frederic Weisbecker 提交于
      - fix addr2line on userspace binary: don't only check kernel image.
      - fix string allocation size for path: missing ending null char room
      - fix overflow in symbol extra info
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244907563-7820-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c17c2db1
    • I
      perf stat: Enable raw data to be printed · ef281a19
      Ingo Molnar 提交于
      If -vv (very verbose) is specified, print out raw data
      in the following format:
      
      $ perf stat -vv -r 3 ./loop_1b_instructions
      
      [ perf stat: executing run #1 ... ]
      [ perf stat: executing run #2 ... ]
      [ perf stat: executing run #3 ... ]
      
      debug:              runtime[0]: 235871872
      debug:             walltime[0]: 236646752
      debug:       runtime_cycles[0]: 755150182
      debug:            counter/0[0]: 235871872
      debug:            counter/1[0]: 235871872
      debug:            counter/2[0]: 235871872
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235870662
      debug:            counter/2[1]: 235870662
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235870437
      debug:            counter/2[2]: 235870437
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 140
      debug:            counter/1[3]: 235870298
      debug:            counter/2[3]: 235870298
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755150182
      debug:            counter/1[4]: 235870145
      debug:            counter/2[4]: 235870145
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001411258
      debug:            counter/1[5]: 235868838
      debug:            counter/2[5]: 235868838
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 27897
      debug:            counter/1[6]: 235868560
      debug:            counter/2[6]: 235868560
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2910
      debug:            counter/1[7]: 235868151
      debug:            counter/2[7]: 235868151
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235980257
      debug:             walltime[0]: 236770942
      debug:       runtime_cycles[0]: 755114546
      debug:            counter/0[0]: 235980257
      debug:            counter/1[0]: 235980257
      debug:            counter/2[0]: 235980257
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 3
      debug:            counter/1[1]: 235980049
      debug:            counter/2[1]: 235980049
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235979907
      debug:            counter/2[2]: 235979907
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235979780
      debug:            counter/2[3]: 235979780
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755114546
      debug:            counter/1[4]: 235979652
      debug:            counter/2[4]: 235979652
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001439771
      debug:            counter/1[5]: 235979304
      debug:            counter/2[5]: 235979304
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 23723
      debug:            counter/1[6]: 235979050
      debug:            counter/2[6]: 235979050
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2213
      debug:            counter/1[7]: 235978820
      debug:            counter/2[7]: 235978820
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235888002
      debug:             walltime[0]: 236700533
      debug:       runtime_cycles[0]: 754881504
      debug:            counter/0[0]: 235888002
      debug:            counter/1[0]: 235888002
      debug:            counter/2[0]: 235888002
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235887793
      debug:            counter/2[1]: 235887793
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235887645
      debug:            counter/2[2]: 235887645
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235887499
      debug:            counter/2[3]: 235887499
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 754881504
      debug:            counter/1[4]: 235887368
      debug:            counter/2[4]: 235887368
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001401731
      debug:            counter/1[5]: 235887024
      debug:            counter/2[5]: 235887024
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 24212
      debug:            counter/1[6]: 235886786
      debug:            counter/2[6]: 235886786
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 1824
      debug:            counter/1[7]: 235886560
      debug:            counter/2[7]: 235886560
      debug:               scaled[7]: 0
      
       Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):
      
           235.913377  task-clock-msecs     #      0.997 CPUs    ( +-   0.011% )
                    2  context-switches     #      0.000 M/sec   ( +-   0.000% )
                    1  CPU-migrations       #      0.000 M/sec   ( +-   0.000% )
                  136  page-faults          #      0.001 M/sec   ( +-   0.730% )
            755048744  cycles               #   3200.534 M/sec   ( +-   0.009% )
           1001417586  instructions         #      1.326 IPC     ( +-   0.001% )
                25277  cache-references     #      0.107 M/sec   ( +-   3.988% )
                 2315  cache-misses         #      0.010 M/sec   ( +-   9.845% )
      
          0.236706075  seconds time elapsed.
      
      This allows the summary stats to be validated.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef281a19
    • I
      perf stat: Add feature to run and measure a command multiple times · 42202dd5
      Ingo Molnar 提交于
      Add the --repeat <n> feature to perf stat, which repeats a given
      command up to a 100 times, collects the stats and calculates an
      average and a stddev.
      
      For example, the following oneliner 'perf stat' command runs hackbench
      5 times and prints a tabulated result of all metrics, with averages
      and noise levels (in percentage) printed:
      
       aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
       Time: 0.117
       Time: 0.108
       Time: 0.089
       Time: 0.088
       Time: 0.100
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
          1243.989586  task-clock-msecs     #     10.460 CPUs    ( +-   4.720% )
                47706  context-switches     #      0.038 M/sec   ( +-  19.706% )
                  387  CPU-migrations       #      0.000 M/sec   ( +-   3.608% )
                17793  page-faults          #      0.014 M/sec   ( +-   0.354% )
           3770941606  cycles               #   3031.329 M/sec   ( +-   4.621% )
           1566372416  instructions         #      0.415 IPC     ( +-   2.703% )
             16783421  cache-references     #     13.492 M/sec   ( +-   5.202% )
              7128590  cache-misses         #      5.730 M/sec   ( +-   7.420% )
      
          0.118924455  seconds time elapsed.
      
      The goal of this feature is to allow the reliance on these accurate
      statistics and to know how many times a command has to be repeated
      for the noise to go down to an acceptable level.
      
      (The -v option can be used to see a line printed out as each run progresses.)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      42202dd5
    • I
      perf stat: Reorganize output · 44175b6f
      Ingo Molnar 提交于
       - use IPC for the instruction normalization output
       - CPUs for the CPU utilization factor value.
       - print out time elapsed like the other rows
       - tidy up the task-clocks/cpu-clocks printout
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      44175b6f
    • F
      perf annotate: Print a sorted summary of annotated overhead lines · 971738f3
      Frederic Weisbecker 提交于
      It's can be very annoying to scroll down perf annotated output
      until we find relevant overhead.
      
      Using the -l option, you can now have a small summary sorted per
      overhead in the beginning of the output.
      
      Example:
      
      ./perf annotate -l -k ../../vmlinux -s __lock_acquire
      
      Sorted summary for file ../../vmlinux
      ----------------------------------------------
      
         12.04 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          4.61 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
          3.77 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1775
          3.56 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          2.93 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
          2.83 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2545
          2.30 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
          2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2388
          2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:138
          1.88 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2548
          1.47 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
          1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
          1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1654
          1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2592
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
      
      [...]
      
      Only overhead over 0.5% are summarized.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244844682-12928-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      971738f3
    • F
      perf annotate: Print the filename:line for annotated colored lines · 301406b9
      Frederic Weisbecker 提交于
      When we have a colored line in perf annotate, ie a middle/high
      overhead one, it's sometimes useful to get the matching line
      and filename from the source file, especially this path prepares
      to another subsequent one which will print a sorted summary of
      midle/high overhead lines in the beginning of the output.
      
      Filename:Lines have the same color than the concerned ip lines.
      
      It can be slow because it relies on addr2line. We could also
      use objdump with -l but that implies we would have to bufferize
      objdump output and parse it to filter the relevant lines since
      we want to print a sorted summary in the beginning.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244844682-12928-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      301406b9
    • M
      perf_counter: Start documenting HAVE_PERF_COUNTERS requirements · 018df72d
      Mike Frysinger 提交于
      Help out arch porters who want to support perf counters by listing some
      basic requirements.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244827063-24046-1-git-send-email-vapier@gentoo.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      018df72d
  7. 12 6月, 2009 3 次提交
    • P
      perf_counter: Add forward/backward attribute ABI compatibility · 974802ea
      Peter Zijlstra 提交于
      Provide for means of extending the perf_counter_attr in a 'natural' way.
      
      We allow growing the structure by appending fields at the end by specifying
      the full structure size inside it.
      
      When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
      When an old kernel sees a larger (new) structure, it will verify the tail
      consists of 0s, otherwise fail.
      
      If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
      native attribe size back into the provided structure.
      
      Furthermore, add some attribute verification, so that we'll fail counter
      creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
      the __reserved fields).
      
      (This ABI detail is introduced while keeping the existing syscall ABI.)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      974802ea
    • P
      perf record: Explicity program a default counter · bbd36e5e
      Peter Zijlstra 提交于
      Up until now record has worked on the assumption that type=0, config=0
      was a suitable configuration - which it is. Lets make this a little more
      explicit and more readable via the use of proper symbols.
      
      [ Impact: cleanup ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbd36e5e
    • Y
      perf_counter tools: Remove one L1-data alias · faafec1e
      Yong Wang 提交于
      Otherwise all L1-instruction aliases will be recognized as
      L1-data by strcasestr() when calling function parse_aliases.
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090612031706.GA22126@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faafec1e
  8. 11 6月, 2009 3 次提交
    • P
      perf_counter: Standardize event names · f4dbfa8f
      Peter Zijlstra 提交于
      Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f4dbfa8f
    • I
      perf_counter tools: Clean up u64 usage · 729ff5e2
      Ingo Molnar 提交于
      A build error slipped in:
      
       builtin-report.c: In function ‘hist_entry__fprintf’:
       builtin-report.c:711: error: format ‘%12d’ expects type ‘int’, but argument 3 has type ‘uint64_t’
      
      Because we got a bit sloppy with those types. uint64_t really sucks,
      because there's no printf format for it. So standardize on __u64
      instead - for all types that go to or come from the ABI (which is __u64),
      or for values that need to be large enough even on 32-bit.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      729ff5e2
    • P
      perf_counter tools: Normalize data using per sample period data · ea1900e5
      Peter Zijlstra 提交于
      When we use variable period sampling, add the period to the sample
      data and use that to normalize the samples.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea1900e5
  9. 10 6月, 2009 2 次提交
  10. 09 6月, 2009 2 次提交
    • I
      perf_counter tools: Standardize color printing · aefcf37b
      Ingo Molnar 提交于
      The rule is:
      
       - high overhead: red
       -  mid overhead: green
       -  low overhead: normal (white/black)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aefcf37b
    • P
      perf report: Add support for profiling JIT generated code · 80d496be
      Pekka Enberg 提交于
      This patch adds support for profiling JIT generated code to 'perf
      report'. A JIT compiler is required to generate a "/tmp/perf-$PID.map"
      symbols map that is parsed when looking and displaying symbols.
      
      Thanks to Peter Zijlstra for his help with this patch!
      
      Example "perf report" output with the Jato JIT:
      
       #
       # (40311 samples)
       #
       # Overhead           Command  Shared Object              Symbol
       # ........  ................  .........................  ......
       #
           97.80%              jato  /tmp/perf-11915.map        [.] Fibonacci.fib(I)I
            0.56%              jato  00000000b7fa023b           0x000000b7fa023b
            0.45%              jato  /tmp/perf-11915.map        [.] Fibonacci.main([Ljava/lang/String;)V
            0.38%              jato  [kernel]                   [k] get_page_from_freelist
            0.06%              jato  [kernel]                   [k] kunmap_atomic
            0.05%              jato  ./jato                     [.] utf8Hash
            0.04%              jato  ./jato                     [.] executeJava
            0.04%              jato  ./jato                     [.] defineClass
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: a.p.zijlstra@chello.nl
      Cc: acme@redhat.com
      LKML-Reference: <Pine.LNX.4.64.0906082111590.12407@melkki.cs.Helsinki.FI>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      80d496be
  11. 08 6月, 2009 1 次提交
    • I
      perf stat: Print out instructins/cycle metric · e779898a
      Ingo Molnar 提交于
      Before:
      
           7549326754  cycles               #    3201.811 M/sec
          10007594937  instructions         #    4244.408 M/sec
      
      After:
      
           7542051194  cycles               #    3201.996 M/sec
          10007743852  instructions         #    4248.811 M/sec # 1.327 per cycle
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e779898a
  12. 07 6月, 2009 6 次提交
    • I
      perf report: Print more expressive message in case of file open error · a14832ff
      Ingo Molnar 提交于
      Before:
      
       $ perf report
       failed to open file: No such file or directory
      
      After:
      
       $ perf report
        failed to open file: perf.data  (try 'perf record' first)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a14832ff
    • I
      perf_counter tools: Handle kernels with !CONFIG_PERF_COUNTER · 30c806a0
      Ingo Molnar 提交于
      If perf is run on a !CONFIG_PERF_COUNTER kernel right now it
      bails out with no messages or with confusing messages.
      
      Standardize this case some more and explain the situation.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      30c806a0
    • I
      perf record: Fall back to cpu-clock-ticks if no PMU · 3da297a6
      Ingo Molnar 提交于
      On architectures/CPUs without PMU support but with perfcounters
      enabled 'perf record' currently fails because it cannot create a
      cycle based hw-perfcounter.
      
      Fall back to the cpu-clock-tick sw-perfcounter in this case, which
      is hrtimer based and will always work (as long as perfcounters
      are enabled).
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3da297a6
    • I
      perf top: Fall back to cpu-clock-tick hrtimer sampling if no cycle counter available · 716c69fe
      Ingo Molnar 提交于
      On architectures/CPUs without PMU support but with perfcounters
      enabled 'perf top' currently fails because it cannot create a
      cycle based hw-perfcounter.
      
      Fall back to the cpu-clock-tick sw-perfcounter in this case, which
      is hrtimer based and will always work (as long as perfcounters
      is enabled).
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      716c69fe
    • I
      perf stat: Continue even on counter creation error · 743ee1f8
      Ingo Molnar 提交于
      Before:
      
       $ perf stat ~/hackbench 5
      
       error: syscall returned with -1 (No such device)
      
      After:
      
       $ perf stat ~/hackbench 5
       Time: 1.640
      
       Performance counter stats for '/home/mingo/hackbench 5':
      
          6524.570382  task-clock-ticks     #       3.838 CPU utilization factor
                35704  context-switches     #       0.005 M/sec
                  191  CPU-migrations       #       0.000 M/sec
                 8958  page-faults          #       0.001 M/sec
        <not counted>  cycles
        <not counted>  instructions
        <not counted>  cache-references
        <not counted>  cache-misses
      
       Wall-clock time elapsed:  1699.999995 msecs
      
      Also add -v (--verbose) option to allow the printing of failed
      counter opens.
      
      Plus dont print 'inf' if wall-time is zero (due to jiffies granularity),
      instead skip the printing of the CPU utilization factor.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      743ee1f8
    • F
      perf top: Wait for a minimal set of events before reading first snapshot · 2f01190a
      Frederic Weisbecker 提交于
      The first snapshot reading often occur before any events have
      been read in the mapped perfcounter files.
      
      Just wait until we have at least one event before starting the
      snapshot, or the delay before the first set of entries to be
      displayed may be long in case of low refresh rate.
      
      Note: we could also use a semaphore to wait before
      "print_entries" number of eveents is reached, but again this
      value is tunable and we can't ensure we will even reach it.
      Also we could base on a default mimimum set of entries for the
      first refresh, say 15, but again, the minimal sample is
      tunable, and we could end up displaying nothing until we have a
      minimal default set of events, which can take some time in case
      of high samples filters.
      
      Hence this simple solution which partially covers the default
      case.
      
      [ Impact: fix display artifacts in perf top ]
      Signed-off-by: NFrederic Weisbecker <fweisbeec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1244322643-6447-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2f01190a