1. 18 6月, 2009 7 次提交
  2. 15 6月, 2009 4 次提交
    • I
      perf report: Fix 32-bit printf format · e2eae0f5
      Ingo Molnar 提交于
      Yong Wang reported the following compiler warning:
      
       builtin-report.c: In function 'process_overflow_event':
       builtin-report.c:984: error: cast to pointer from integer of different size
      
      Which happens because we try to print ->ips[] out with a limited
      format, losing the high 32 bits. Print it out using %016Lx instead.
      Reported-by: NYong Wang <yong.y.wang@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e2eae0f5
    • I
      perf report: Add per system call overhead histogram · 3dfabc74
      Ingo Molnar 提交于
      Take advantage of call-graph percounter sampling/recording to
      display a non-trivial histogram: the true, collapsed/summarized
      cost measurement, on a per system call total overhead basis:
      
       aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10
       aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10
       #
       # (3536 samples)
       #
       # Overhead  Symbol
       # ........  ......
       #
           40.75%  [k] sys_write
           40.21%  [k] sys_read
            4.44%  [k] do_nmi
       ...
      
      This is done by accounting each (reliable) call-chain that chains back
      to a given system call to that system call function.
      
      [ So in the above example we can see that hackbench spends about 40% of
        its total time somewhere in sys_write() and 40% somewhere in
        sys_read(), the rest of the time is spent in user-space. The time
        is not spent in sys_write() _itself_ but in one of its many child
        functions. ]
      
      Or, a recording of a (source files are already in the page-cache) kernel build:
      
       $ perf record -g -m 512 -f -- make -j32 kernel
       $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi
      
           4.14%  [k] do_page_fault
           1.20%  [k] sys_write
           1.10%  [k] sys_open
           0.63%  [k] sys_exit_group
           0.48%  [k] smp_apic_timer_interrupt
           0.37%  [k] sys_read
           0.37%  [k] sys_execve
           0.20%  [k] sys_mmap
           0.18%  [k] sys_close
           0.14%  [k] sys_munmap
           0.13%  [k] sys_poll
           0.09%  [k] sys_newstat
           0.07%  [k] sys_clone
           0.06%  [k] sys_newfstat
           0.05%  [k] sys_access
           0.05%  [k] schedule
      
      Shows the true total cost of each syscall variant that gets used
      during a kernel build. This profile reveals it that pagefaults are
      the costliest, followed by read()/write().
      
      An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds
      per 100 seconds of kernel build-time. (this was done with HZ=1000)
      
      The summary is done in 'perf report', i.e. in the post-processing
      stage - so once we have a good call-graph recording, this type of
      non-trivial high-level analysis becomes possible.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3dfabc74
    • I
      perf record: Fix fast task-exit race · 613d8602
      Ingo Molnar 提交于
      Recording with -a (or with -p) can race with tasks going away:
      
         couldn't open /proc/8440/maps
      
      Causing an early exit() and no recording done.
      
      Do not abort the recording session - instead just skip that task.
      
      Also, only print the warnings under -v.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      613d8602
    • I
      perf record/report: Add call graph / call chain profiling · 3efa1cc9
      Ingo Molnar 提交于
      Add the first steps of call-graph profiling:
      
       - add the -c (--call-graph) option to perf record
       - parse the call-graph record and printout out under -D (--dump-trace)
      
      The call-graph data is not put into the histogram yet, but it
      can be seen that it's being processed correctly:
      
      0x3ce0 [0x38]: event: 35
      .
      . ... raw event: size 56 bytes
      .  0000:  23 00 00 00 05 00 38 00 d4 df 0e 81 ff ff ff ff  #.....8........
      .  0010:  60 0b 00 00 60 0b 00 00 03 00 00 00 01 00 02 00  `...`..........
      .  0020:  d4 df 0e 81 ff ff ff ff a0 61 ed 41 36 00 00 00  .........a.A6..
      .  0030:  04 92 e6 41 36 00 00 00                          .a.A6..
      .
      0x3ce0 [0x38]: PERF_EVENT (IP, 5): 2912: 0xffffffff810edfd4 period: 1
      ... chain: u:2, k:1, nr:3
      .....  0: 0xffffffff810edfd4
      .....  1: 0x3641ed61a0
      .....  2: 0x3641e69204
       ... thread: perf:2912
       ...... dso: [kernel]
      
      This shows a 3-entry call-graph: with 1 kernel-space and two user-space
      entries
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3efa1cc9
  3. 14 6月, 2009 1 次提交
    • I
      perf report: Print out raw events in hexa · 8465b050
      Ingo Molnar 提交于
      Print out events in hexa dump format, when -D is specified:
      
      0x4868 [0x48]: event: 1
      .
      . ... raw event: size 72 bytes
      .  0000:  01 00 00 00 00 00 48 00 d4 72 00 00 d4 72 00 00  ......H..r...r.
      .  0010:  00 00 40 f2 3e 00 00 00 00 30 01 00 00 00 00 00  ..@.>....0.....
      .  0020:  00 00 00 00 00 00 00 00 2f 75 73 72 2f 6c 69 62  ......../usr/li
      .  0030:  36 34 2f 6c 69 62 65 6c 66 2d 30 2e 31 34 31 2e  64/libelf-0.141
      .  0040:  73 6f 00 00 00 00 00 00                          f-0.141
      .
      0x4868 [0x48]: PERF_EVENT_MMAP 29396: [0x3ef2400000(0x13000) @ (nil)]: /usr/lib64/libelf-0.141.so
      
      This helps the debugging of mis-parsing of data files, and helps
      the addition of new sample/trace formats.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8465b050
  4. 13 6月, 2009 7 次提交
    • F
      perf annotate: Fixes for filename:line displays · c17c2db1
      Frederic Weisbecker 提交于
      - fix addr2line on userspace binary: don't only check kernel image.
      - fix string allocation size for path: missing ending null char room
      - fix overflow in symbol extra info
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244907563-7820-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c17c2db1
    • I
      perf stat: Enable raw data to be printed · ef281a19
      Ingo Molnar 提交于
      If -vv (very verbose) is specified, print out raw data
      in the following format:
      
      $ perf stat -vv -r 3 ./loop_1b_instructions
      
      [ perf stat: executing run #1 ... ]
      [ perf stat: executing run #2 ... ]
      [ perf stat: executing run #3 ... ]
      
      debug:              runtime[0]: 235871872
      debug:             walltime[0]: 236646752
      debug:       runtime_cycles[0]: 755150182
      debug:            counter/0[0]: 235871872
      debug:            counter/1[0]: 235871872
      debug:            counter/2[0]: 235871872
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235870662
      debug:            counter/2[1]: 235870662
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235870437
      debug:            counter/2[2]: 235870437
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 140
      debug:            counter/1[3]: 235870298
      debug:            counter/2[3]: 235870298
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755150182
      debug:            counter/1[4]: 235870145
      debug:            counter/2[4]: 235870145
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001411258
      debug:            counter/1[5]: 235868838
      debug:            counter/2[5]: 235868838
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 27897
      debug:            counter/1[6]: 235868560
      debug:            counter/2[6]: 235868560
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2910
      debug:            counter/1[7]: 235868151
      debug:            counter/2[7]: 235868151
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235980257
      debug:             walltime[0]: 236770942
      debug:       runtime_cycles[0]: 755114546
      debug:            counter/0[0]: 235980257
      debug:            counter/1[0]: 235980257
      debug:            counter/2[0]: 235980257
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 3
      debug:            counter/1[1]: 235980049
      debug:            counter/2[1]: 235980049
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235979907
      debug:            counter/2[2]: 235979907
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235979780
      debug:            counter/2[3]: 235979780
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755114546
      debug:            counter/1[4]: 235979652
      debug:            counter/2[4]: 235979652
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001439771
      debug:            counter/1[5]: 235979304
      debug:            counter/2[5]: 235979304
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 23723
      debug:            counter/1[6]: 235979050
      debug:            counter/2[6]: 235979050
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2213
      debug:            counter/1[7]: 235978820
      debug:            counter/2[7]: 235978820
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235888002
      debug:             walltime[0]: 236700533
      debug:       runtime_cycles[0]: 754881504
      debug:            counter/0[0]: 235888002
      debug:            counter/1[0]: 235888002
      debug:            counter/2[0]: 235888002
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235887793
      debug:            counter/2[1]: 235887793
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235887645
      debug:            counter/2[2]: 235887645
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235887499
      debug:            counter/2[3]: 235887499
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 754881504
      debug:            counter/1[4]: 235887368
      debug:            counter/2[4]: 235887368
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001401731
      debug:            counter/1[5]: 235887024
      debug:            counter/2[5]: 235887024
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 24212
      debug:            counter/1[6]: 235886786
      debug:            counter/2[6]: 235886786
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 1824
      debug:            counter/1[7]: 235886560
      debug:            counter/2[7]: 235886560
      debug:               scaled[7]: 0
      
       Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):
      
           235.913377  task-clock-msecs     #      0.997 CPUs    ( +-   0.011% )
                    2  context-switches     #      0.000 M/sec   ( +-   0.000% )
                    1  CPU-migrations       #      0.000 M/sec   ( +-   0.000% )
                  136  page-faults          #      0.001 M/sec   ( +-   0.730% )
            755048744  cycles               #   3200.534 M/sec   ( +-   0.009% )
           1001417586  instructions         #      1.326 IPC     ( +-   0.001% )
                25277  cache-references     #      0.107 M/sec   ( +-   3.988% )
                 2315  cache-misses         #      0.010 M/sec   ( +-   9.845% )
      
          0.236706075  seconds time elapsed.
      
      This allows the summary stats to be validated.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef281a19
    • I
      perf stat: Add feature to run and measure a command multiple times · 42202dd5
      Ingo Molnar 提交于
      Add the --repeat <n> feature to perf stat, which repeats a given
      command up to a 100 times, collects the stats and calculates an
      average and a stddev.
      
      For example, the following oneliner 'perf stat' command runs hackbench
      5 times and prints a tabulated result of all metrics, with averages
      and noise levels (in percentage) printed:
      
       aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
       Time: 0.117
       Time: 0.108
       Time: 0.089
       Time: 0.088
       Time: 0.100
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
          1243.989586  task-clock-msecs     #     10.460 CPUs    ( +-   4.720% )
                47706  context-switches     #      0.038 M/sec   ( +-  19.706% )
                  387  CPU-migrations       #      0.000 M/sec   ( +-   3.608% )
                17793  page-faults          #      0.014 M/sec   ( +-   0.354% )
           3770941606  cycles               #   3031.329 M/sec   ( +-   4.621% )
           1566372416  instructions         #      0.415 IPC     ( +-   2.703% )
             16783421  cache-references     #     13.492 M/sec   ( +-   5.202% )
              7128590  cache-misses         #      5.730 M/sec   ( +-   7.420% )
      
          0.118924455  seconds time elapsed.
      
      The goal of this feature is to allow the reliance on these accurate
      statistics and to know how many times a command has to be repeated
      for the noise to go down to an acceptable level.
      
      (The -v option can be used to see a line printed out as each run progresses.)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      42202dd5
    • I
      perf stat: Reorganize output · 44175b6f
      Ingo Molnar 提交于
       - use IPC for the instruction normalization output
       - CPUs for the CPU utilization factor value.
       - print out time elapsed like the other rows
       - tidy up the task-clocks/cpu-clocks printout
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      44175b6f
    • F
      perf annotate: Print a sorted summary of annotated overhead lines · 971738f3
      Frederic Weisbecker 提交于
      It's can be very annoying to scroll down perf annotated output
      until we find relevant overhead.
      
      Using the -l option, you can now have a small summary sorted per
      overhead in the beginning of the output.
      
      Example:
      
      ./perf annotate -l -k ../../vmlinux -s __lock_acquire
      
      Sorted summary for file ../../vmlinux
      ----------------------------------------------
      
         12.04 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          4.61 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
          3.77 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1775
          3.56 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          2.93 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
          2.83 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2545
          2.30 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
          2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2388
          2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:138
          1.88 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2548
          1.47 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
          1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
          1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1654
          1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2592
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
      
      [...]
      
      Only overhead over 0.5% are summarized.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244844682-12928-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      971738f3
    • F
      perf annotate: Print the filename:line for annotated colored lines · 301406b9
      Frederic Weisbecker 提交于
      When we have a colored line in perf annotate, ie a middle/high
      overhead one, it's sometimes useful to get the matching line
      and filename from the source file, especially this path prepares
      to another subsequent one which will print a sorted summary of
      midle/high overhead lines in the beginning of the output.
      
      Filename:Lines have the same color than the concerned ip lines.
      
      It can be slow because it relies on addr2line. We could also
      use objdump with -l but that implies we would have to bufferize
      objdump output and parse it to filter the relevant lines since
      we want to print a sorted summary in the beginning.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244844682-12928-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      301406b9
    • M
      perf_counter: Start documenting HAVE_PERF_COUNTERS requirements · 018df72d
      Mike Frysinger 提交于
      Help out arch porters who want to support perf counters by listing some
      basic requirements.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244827063-24046-1-git-send-email-vapier@gentoo.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      018df72d
  5. 12 6月, 2009 3 次提交
    • P
      perf_counter: Add forward/backward attribute ABI compatibility · 974802ea
      Peter Zijlstra 提交于
      Provide for means of extending the perf_counter_attr in a 'natural' way.
      
      We allow growing the structure by appending fields at the end by specifying
      the full structure size inside it.
      
      When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
      When an old kernel sees a larger (new) structure, it will verify the tail
      consists of 0s, otherwise fail.
      
      If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
      native attribe size back into the provided structure.
      
      Furthermore, add some attribute verification, so that we'll fail counter
      creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
      the __reserved fields).
      
      (This ABI detail is introduced while keeping the existing syscall ABI.)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      974802ea
    • P
      perf record: Explicity program a default counter · bbd36e5e
      Peter Zijlstra 提交于
      Up until now record has worked on the assumption that type=0, config=0
      was a suitable configuration - which it is. Lets make this a little more
      explicit and more readable via the use of proper symbols.
      
      [ Impact: cleanup ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbd36e5e
    • Y
      perf_counter tools: Remove one L1-data alias · faafec1e
      Yong Wang 提交于
      Otherwise all L1-instruction aliases will be recognized as
      L1-data by strcasestr() when calling function parse_aliases.
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090612031706.GA22126@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faafec1e
  6. 11 6月, 2009 3 次提交
    • P
      perf_counter: Standardize event names · f4dbfa8f
      Peter Zijlstra 提交于
      Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f4dbfa8f
    • I
      perf_counter tools: Clean up u64 usage · 729ff5e2
      Ingo Molnar 提交于
      A build error slipped in:
      
       builtin-report.c: In function ‘hist_entry__fprintf’:
       builtin-report.c:711: error: format ‘%12d’ expects type ‘int’, but argument 3 has type ‘uint64_t’
      
      Because we got a bit sloppy with those types. uint64_t really sucks,
      because there's no printf format for it. So standardize on __u64
      instead - for all types that go to or come from the ABI (which is __u64),
      or for values that need to be large enough even on 32-bit.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      729ff5e2
    • P
      perf_counter tools: Normalize data using per sample period data · ea1900e5
      Peter Zijlstra 提交于
      When we use variable period sampling, add the period to the sample
      data and use that to normalize the samples.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea1900e5
  7. 10 6月, 2009 2 次提交
  8. 09 6月, 2009 2 次提交
    • I
      perf_counter tools: Standardize color printing · aefcf37b
      Ingo Molnar 提交于
      The rule is:
      
       - high overhead: red
       -  mid overhead: green
       -  low overhead: normal (white/black)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aefcf37b
    • P
      perf report: Add support for profiling JIT generated code · 80d496be
      Pekka Enberg 提交于
      This patch adds support for profiling JIT generated code to 'perf
      report'. A JIT compiler is required to generate a "/tmp/perf-$PID.map"
      symbols map that is parsed when looking and displaying symbols.
      
      Thanks to Peter Zijlstra for his help with this patch!
      
      Example "perf report" output with the Jato JIT:
      
       #
       # (40311 samples)
       #
       # Overhead           Command  Shared Object              Symbol
       # ........  ................  .........................  ......
       #
           97.80%              jato  /tmp/perf-11915.map        [.] Fibonacci.fib(I)I
            0.56%              jato  00000000b7fa023b           0x000000b7fa023b
            0.45%              jato  /tmp/perf-11915.map        [.] Fibonacci.main([Ljava/lang/String;)V
            0.38%              jato  [kernel]                   [k] get_page_from_freelist
            0.06%              jato  [kernel]                   [k] kunmap_atomic
            0.05%              jato  ./jato                     [.] utf8Hash
            0.04%              jato  ./jato                     [.] executeJava
            0.04%              jato  ./jato                     [.] defineClass
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: a.p.zijlstra@chello.nl
      Cc: acme@redhat.com
      LKML-Reference: <Pine.LNX.4.64.0906082111590.12407@melkki.cs.Helsinki.FI>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      80d496be
  9. 08 6月, 2009 1 次提交
    • I
      perf stat: Print out instructins/cycle metric · e779898a
      Ingo Molnar 提交于
      Before:
      
           7549326754  cycles               #    3201.811 M/sec
          10007594937  instructions         #    4244.408 M/sec
      
      After:
      
           7542051194  cycles               #    3201.996 M/sec
          10007743852  instructions         #    4248.811 M/sec # 1.327 per cycle
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e779898a
  10. 07 6月, 2009 10 次提交
    • I
      perf report: Print more expressive message in case of file open error · a14832ff
      Ingo Molnar 提交于
      Before:
      
       $ perf report
       failed to open file: No such file or directory
      
      After:
      
       $ perf report
        failed to open file: perf.data  (try 'perf record' first)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a14832ff
    • I
      perf_counter tools: Handle kernels with !CONFIG_PERF_COUNTER · 30c806a0
      Ingo Molnar 提交于
      If perf is run on a !CONFIG_PERF_COUNTER kernel right now it
      bails out with no messages or with confusing messages.
      
      Standardize this case some more and explain the situation.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      30c806a0
    • I
      perf record: Fall back to cpu-clock-ticks if no PMU · 3da297a6
      Ingo Molnar 提交于
      On architectures/CPUs without PMU support but with perfcounters
      enabled 'perf record' currently fails because it cannot create a
      cycle based hw-perfcounter.
      
      Fall back to the cpu-clock-tick sw-perfcounter in this case, which
      is hrtimer based and will always work (as long as perfcounters
      are enabled).
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3da297a6
    • I
      perf top: Fall back to cpu-clock-tick hrtimer sampling if no cycle counter available · 716c69fe
      Ingo Molnar 提交于
      On architectures/CPUs without PMU support but with perfcounters
      enabled 'perf top' currently fails because it cannot create a
      cycle based hw-perfcounter.
      
      Fall back to the cpu-clock-tick sw-perfcounter in this case, which
      is hrtimer based and will always work (as long as perfcounters
      is enabled).
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      716c69fe
    • I
      perf stat: Continue even on counter creation error · 743ee1f8
      Ingo Molnar 提交于
      Before:
      
       $ perf stat ~/hackbench 5
      
       error: syscall returned with -1 (No such device)
      
      After:
      
       $ perf stat ~/hackbench 5
       Time: 1.640
      
       Performance counter stats for '/home/mingo/hackbench 5':
      
          6524.570382  task-clock-ticks     #       3.838 CPU utilization factor
                35704  context-switches     #       0.005 M/sec
                  191  CPU-migrations       #       0.000 M/sec
                 8958  page-faults          #       0.001 M/sec
        <not counted>  cycles
        <not counted>  instructions
        <not counted>  cache-references
        <not counted>  cache-misses
      
       Wall-clock time elapsed:  1699.999995 msecs
      
      Also add -v (--verbose) option to allow the printing of failed
      counter opens.
      
      Plus dont print 'inf' if wall-time is zero (due to jiffies granularity),
      instead skip the printing of the CPU utilization factor.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      743ee1f8
    • F
      perf top: Wait for a minimal set of events before reading first snapshot · 2f01190a
      Frederic Weisbecker 提交于
      The first snapshot reading often occur before any events have
      been read in the mapped perfcounter files.
      
      Just wait until we have at least one event before starting the
      snapshot, or the delay before the first set of entries to be
      displayed may be long in case of low refresh rate.
      
      Note: we could also use a semaphore to wait before
      "print_entries" number of eveents is reached, but again this
      value is tunable and we can't ensure we will even reach it.
      Also we could base on a default mimimum set of entries for the
      first refresh, say 15, but again, the minimal sample is
      tunable, and we could end up displaying nothing until we have a
      minimal default set of events, which can take some time in case
      of high samples filters.
      
      Hence this simple solution which partially covers the default
      case.
      
      [ Impact: fix display artifacts in perf top ]
      Signed-off-by: NFrederic Weisbecker <fweisbeec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1244322643-6447-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2f01190a
    • I
      perf annotate: Fix command line help text · 23b87116
      Ingo Molnar 提交于
      Arjan noticed this bug in the perf annotate help output:
      
          -s, --symbol <file>   symbol to annotate
      
      that should be <symbol> instead.
      Reported-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      23b87116
    • A
      perf_counter tools: Initialize a stack variable before use · e9fbc9dc
      Arjan van de Ven 提交于
      the "perf report" utility crashed in some circumstances
      because the "sym" stack variable was not initialized before used
      (as also proven by valgrind).
      
      With this fix both the crash goes away and valgrind no longer complains.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e9fbc9dc
    • I
      perf annotate: Automatically pick up vmlinux in the local directory · 39273ee9
      Ingo Molnar 提交于
      Right now kernel debug info does not get resolved by default, because
      we dont know where to look for the vmlinux.
      
      The -k option can be used for that - but if no option is given, pick
      up vmlinux files in the current directory - in case a kernel hacker
      runs profiling from the source directory that the kernel was built in.
      
      The real solution would be to embedd the location (and perhaps the
      date/timestamp) of the vmlinux file in /proc/kallsyms, so that
      tools can pick it up automatically.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39273ee9
    • I
      perf_counter tools: Fix error condition in parse_aliases() · 8953645f
      Ingo Molnar 提交于
      gcc warned about this bug:
      
      util/parse-events.c: In function ‘parse_generic_hw_symbols’:
      util/parse-events.c:175: warning: comparison is always false due to limited range of data type
      util/parse-events.c:182: warning: comparison is always false due to limited range of data type
      util/parse-events.c:190: warning: comparison is always false due to limited range of data type
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8953645f