1. 03 7月, 2009 3 次提交
    • F
      perf report: Add support for callchain graph output · 4eb3e478
      Frederic Weisbecker 提交于
      Currently, the printing of callchains is done in a single
      vertical level, this is the "flat" mode:
      
      8.25%  [k] copy_user_generic_string
                   4.19%
                      copy_user_generic_string
                      generic_file_aio_read
                      do_sync_read
                      vfs_read
                      sys_pread64
                      system_call_fastpath
                      pread64
      
      This patch introduces a new "graph" mode which provides a
      hierarchical output of factorized paths recursively sorted:
      
       8.25%  [k] copy_user_generic_string
                      |
                      |--4.31%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |          |--4.19%-- sys_pread64
                      |          |          system_call_fastpath
                      |          |          pread64
                      |          |
                      |           --0.12%-- sys_read
                      |                     system_call_fastpath
                      |                     __read
                      |
                      |--3.24%-- generic_file_buffered_write
                      |          __generic_file_aio_write_nolock
                      |          generic_file_aio_write
                      |          do_sync_write
                      |          reiserfs_file_write
                      |          vfs_write
                      |          |
                      |          |--3.14%-- sys_pwrite64
                      |          |          system_call_fastpath
                      |          |          __pwrite64
                      |          |
                      |           --0.10%-- sys_write
      [...]
      
      The command line has then changed.
      
      By providing the -c option, the callchain will output in the
      flat mode by default.
      
      But you can override it:
      
          perf report -c graph
      
      or
      
          perf report -c flat
      
      You can also pass the abreviated mode:
      
          perf report -c g
      
      or
      
          perf report -c gra
      
      will both make use of the graph mode.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4eb3e478
    • F
      perf_counter tools: Add new OPT_CALLBACK_DEFAULT option · 5a4b1817
      Frederic Weisbecker 提交于
      There is no predefined macro to create an option that can have
      a custom value or a default one if none is given.
      
      This patch provides a new helper OPT_CALLBACK_DEFAULT() which
      defines such kind of option.
      
      For example, considering an option -c, we want to get the
      default value in the following cases:
      
          perf command -c -d
          perf command -d -c
      
      And the foo value when it's given:
      
          perf command -c foo -d
          perf command -d -c foo
      
      That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to
      support default values whatever the position of the option, not
      only in the end.
      
      Should it now be renamed to PARSE_OPT_ARG_DEFAULT ?
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: git@vger.kernel.org
      LKML-Reference: <1246550301-8954-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5a4b1817
    • F
      perf_counter tools: Create new chain_for_each_child() iterator · 14f4654c
      Frederic Weisbecker 提交于
      Iterating through children of a node in the callchain tree
      shows something that may be quite confusing at a first glance.
      The head is the children field of the parent and the list nodes
      are in the brothers field of the children.
      
      This is because the childs are linked to the parent as a list
      of "brothers" using the "children" list of the parent as a
      head:
      
        ---------------
       | Parent (head) |-------------------------------------
        ---------------                                      |
           |                                                 |
        children                                             |
           |                                                 |
        -----------               -----------                |
       | 1st child |---brother---| 2nd child |---brother-----
        -----------               -----------
      
      This makes the following strange pattern often occuring:
      
       list_for_each_entry(child, &parent->children, brothers) {
              // do something with children
       }
      
      Abstract it to chain_for_each_child() to factorize and simplify
      this pattern.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      14f4654c
  2. 02 7月, 2009 7 次提交
  3. 01 7月, 2009 17 次提交
    • J
      perf list: Add cache events · 73c24cb8
      Jaswinder Singh Rajput 提交于
      After:
      
      $ ./perf list
      
      List of pre-defined events (to be used in -e):
      
        cpu-cycles OR cycles                     [Hardware event]
        instructions                             [Hardware event]
        cache-references                         [Hardware event]
        cache-misses                             [Hardware event]
        branch-instructions OR branches          [Hardware event]
        branch-misses                            [Hardware event]
        bus-cycles                               [Hardware event]
      
        cpu-clock                                [Software event]
        task-clock                               [Software event]
        page-faults OR faults                    [Software event]
        minor-faults                             [Software event]
        major-faults                             [Software event]
        context-switches OR cs                   [Software event]
        cpu-migrations OR migrations             [Software event]
      
        L1-d$-loads                              [Hardware cache event]
        L1-d$-load-misses                        [Hardware cache event]
        L1-d$-stores                             [Hardware cache event]
        L1-d$-store-misses                       [Hardware cache event]
        L1-d$-prefetches                         [Hardware cache event]
        L1-d$-prefetch-misses                    [Hardware cache event]
        L1-i$-loads                              [Hardware cache event]
        L1-i$-load-misses                        [Hardware cache event]
        L1-i$-prefetches                         [Hardware cache event]
        L1-i$-prefetch-misses                    [Hardware cache event]
        LLC-loads                                [Hardware cache event]
        LLC-load-misses                          [Hardware cache event]
        LLC-stores                               [Hardware cache event]
        LLC-store-misses                         [Hardware cache event]
        LLC-prefetches                           [Hardware cache event]
        LLC-prefetch-misses                      [Hardware cache event]
        dTLB-loads                               [Hardware cache event]
        dTLB-load-misses                         [Hardware cache event]
        dTLB-stores                              [Hardware cache event]
        dTLB-store-misses                        [Hardware cache event]
        dTLB-prefetches                          [Hardware cache event]
        dTLB-prefetch-misses                     [Hardware cache event]
        iTLB-loads                               [Hardware cache event]
        iTLB-load-misses                         [Hardware cache event]
        branch-loads                             [Hardware cache event]
        branch-load-misses                       [Hardware cache event]
      
        rNNN                                     [raw hardware event descriptor]
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1246453578.3072.1.camel@ht.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      73c24cb8
    • J
      perf stat: Define MATCH_EVENT for easy attr checking · b9ebdcc0
      Jaswinder Singh Rajput 提交于
      MATCH_EVENT is useful:
      
       1. for multiple attrs checking
       2. avoid repetition of PERF_TYPE_ and PERF_COUNT_ and save space
       3. avoids line breakage
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246440909.3403.5.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b9ebdcc0
    • I
      perf_counter tools: Add more warnings and fix/annotate them · f37a291c
      Ingo Molnar 提交于
      Enable -Wextra. This found a few real bugs plus a number
      of signed/unsigned type mismatches/uncleanlinesses. It
      also required a few annotations
      
      All things considered it was still worth it so lets try with
      this enabled for now.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f37a291c
    • I
      perf report: Fix HV bit mismerge · 88a69dfb
      Ingo Molnar 提交于
      Fix:
      
       builtin-report.c: In function ‘hist_entry__add’:
       builtin-report.c:1015: error: case label not within a switch statement
       builtin-report.c:1017: error: break statement not within loop or switch
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      88a69dfb
    • P
      perf_counter tools: Rework event string parsing/syntax · 61c45981
      Paul Mackerras 提交于
      This reworks the parser for event descriptors to make it more
      consistent in what it accepts.  It is now structured as a
      recursive descent parser for the following grammar:
      
      events		::= event ( ("," | space) space* event )*
      event		::= ( raw_event | numeric_event | symbolic_event |
      		      generic_hw_event ) [ event_modifier ]
      raw_event	::= "r" hex_number
      numeric_event	::= number ":" number
      number		::= decimal_number | "0x" hex_number | "0" octal_number
      symbolic_event	::= string_from_event_symbols_array
      generic_hw_event::= cache_type ( "-" ( cache_op | cache_result ) )*
      event_modifier	::= ":" ( "u" | "k" | "h" )+
      
      with the extra restriction that you can have at most one
      cache_op and at most one cache_result.
      
      We pass the current string pointer by reference (i.e. as a
      const char **) to the various parsing functions so that they
      can advance the pointer to indicate how much they consumed.
      They return 0 if they didn't recognize the thing at the pointer
      or 1 if they did (and advance the pointer past it).
      
      This also fixes parse_aliases to take the longest matching
      alias from the table, not the first one.  Otherwise "l1-data"
      would match the "l1-d" alias and the "ata" would not be
      consumed.
      
      This allows event modifiers indicating what processor modes to
      count in to be applied to any event, not just numeric events,
      and adds a ":h" modifier to indicate counting in hypervisor
      mode.  Specifying ":u" now sets both exclude_kernel and
      exclude_hv, and so on.  Multiple modes can be specified, e.g.
      ":uk" will count in user or hypervisor mode (i.e. only
      exclude_kernel will be set).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <19018.53826.843815.189847@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      61c45981
    • F
      perf_counter tools: Various fixes for callchains · deac911c
      Frederic Weisbecker 提交于
      The symbol resolving has of course revealed some bugs in the
      callchain tree handling. This patch fixes some of them,
      including:
      
      - inherit the children from the parents while splitting a node
      - fix list range moving
      - fix indexes setting in callchains
      - create a child on the current node if the path doesn't match in
        the existent children (was only done on the root)
      - compare using symbols when possible so that we can match a function
        using any ip inside by referring to its start address.
      
      The practical effects are:
      
      - remove double callchains
      - fix upside down or any random order of callchains
      - fix wrong paths
      - fix bad hits and percentage accounts
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      deac911c
    • F
      perf_counter tools: Resolve symbols in callchains · 4424961a
      Frederic Weisbecker 提交于
      This patch resolves the names, when possible, of each ip
      present in the callchains while using the -c option with perf
      report.
      
      Example:
      
      5.40%  [k] __d_lookup
                   5.37%
                      perf_callchain
                      perf_counter_overflow
                      intel_pmu_handle_irq
                      perf_counter_nmi_handler
                      notifier_call_chain
                      atomic_notifier_call_chain
                      notify_die
                      do_nmi
                      nmi
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      sys_faccessat
                      sys_access
                      system_call_fastpath
                      0x7fb609846f77
      
                   0.01%
                      perf_callchain
                      perf_counter_overflow
                      intel_pmu_handle_irq
                      perf_counter_nmi_handler
                      notifier_call_chain
                      atomic_notifier_call_chain
                      notify_die
                      do_nmi
                      nmi
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      sys_faccessat
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4424961a
    • F
      perf_counter tools: Fix storage size allocation of callchain list · 9198aa77
      Frederic Weisbecker 提交于
      Fix a confusion while giving the size of a callchain list
      during its allocation. We are using the wrong structure size.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9198aa77
    • A
      perf report: Add hypervisor dso · fb9c8188
      Anton Blanchard 提交于
      Add a dso for hypervisor samples. We don't get any symbol
      information on the ppc64 hypervisor but this at least gives
      us a high level summary of the time spent in there.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      LKML-Reference: <20090630230141.182536873@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fb9c8188
    • A
      perf report: Fix reporting of hypervisor · d8db1b57
      Anton Blanchard 提交于
      PERF_EVENT_MISC_* is not a bitmask, so we have to mask and compare.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      LKML-Reference: <20090630230141.088394681@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d8db1b57
    • A
      perf top: Add ppc64 specific skip symbols and strip ppc64 . prefix · 3a3393ef
      Anton Blanchard 提交于
      Filter out some ppc64 specific idle loop functions and remove
      leading '.' on ppc64 text symbols.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      LKML-Reference: <20090630230140.995643441@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a3393ef
    • A
      perf top: Move skip symbols to an array · 2ab52083
      Anton Blanchard 提交于
      Move the list of symbols we skip into an array, making it
      easier to add new ones.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      LKML-Reference: <20090630230140.904782938@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ab52083
    • A
      perf_counter tools: Remove zlib dependency · 6717534d
      Anton Blanchard 提交于
      The zlib devel libraries may not be installed and since we aren't
      using zlib we may as well remove it.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      LKML-Reference: <20090630230140.802078956@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6717534d
    • A
      perf report: Fix -z option · 1f208ea6
      Anton Blanchard 提交于
      Fix a copy and paste error, -z was setting the group option.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: paulus@samba.org
      LKML-Reference: <20090630230140.714204656@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f208ea6
    • A
      perf report: Add --symbols parameter · 7bec7a91
      Arnaldo Carvalho de Melo 提交于
      So that we can filter by symbol name.
      
      The 'pfunct' utility in the 'dwarves' package can be used to
      create a file with the functions one wants.
      
      Example:
      
      [acme@doppio pahole]$ pfunct /usr/lib/debug/usr/lib64/libdw-0.141.so.debug | grep dwarf > /tmp/dwarf.symbols
      [acme@doppio pahole]$ wc -l /tmp/dwarf.symbols
      93 /tmp/dwarf.symbols
      [acme@doppio pahole]$ head -3 /tmp/dwarf.symbols
      dwfl_addrdwarf
      dwfl_module_getdwarf
      dwfl_getdwarf
      [acme@doppio pahole]$ perf report --sort comm,dso,symbol --comms pahole --dsos /usr/lib64/libdw-0.141.so --symbols file:///tmp/dwarf.symbols
      
          33.99%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_tag
          29.07%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_decl_file
          27.71%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_getsrclines
           4.54%            pahole  /usr/lib64/libdw-0.141.so  0x00000000007400
           3.93%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_decl_line
           0.46%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_getlocation
           0.18%            pahole  /usr/lib64/libdw-0.141.so  [.] __libdwarf_next_prime
           0.13%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_diecu
      
      [acme@doppio pahole]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246399282-20934-4-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7bec7a91
    • A
      perf report: Add --comms parameter · cc8b88b1
      Arnaldo Carvalho de Melo 提交于
      So that we can filter by comm. Symbols in other comms won't be
      accounted for.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246399282-20934-3-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc8b88b1
    • A
      perf report: Add --dsos parameter · 25903407
      Arnaldo Carvalho de Melo 提交于
      So that we can filter by dso. Symbols in other dsos won't be
      accounted for.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246399282-20934-2-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      25903407
  4. 30 6月, 2009 4 次提交
    • A
      perf_counter tools: Adjust only prelinked symbol's addresses · f5812a7a
      Arnaldo Carvalho de Melo 提交于
      I.e. we can't handle these two kinds of files in the same way:
      
      1) prelinked system library:
      
      [acme@doppio pahole]$ readelf -s /usr/lib64/libdw-0.141.so | egrep 'FUNC.+GLOBAL.+dwfl_report_elf'
         278: 00000030450105a0   261 FUNC    GLOBAL DEFAULT   12 dwfl_report_elf@@ELFUTILS_0.122
      
      2) not prelinked library with debug information from a -debuginfo package:
      
      [acme@doppio pahole]$ readelf -s /usr/lib/debug/usr/lib64/libdw-0.141.so.debug | egrep 'FUNC.+GLOBAL.+dwfl_report_elf'
         629: 00000000000105a0   261 FUNC    GLOBAL DEFAULT   12 dwfl_report_elf
      [acme@doppio pahole]$
      
      Now the numbers I got for a pahole perf run are in line with
      the numbers I get from oprofile.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20090630144317.GB12663@ghostprotocols.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f5812a7a
    • P
      perf_counter: Provide a way to enable counters on exec · 57e7986e
      Paul Mackerras 提交于
      This provides a way to mark a counter to be enabled on the next
      exec. This is useful for measuring the total activity of a
      program without including overhead from the process that
      launches it.
      
      This also changes the perf stat command to use this new
      facility.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19017.43927.838745.689203@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      57e7986e
    • P
      perf_counter tools: Reduce perf stat measurement overhead/skew · 051ae7f7
      Paul Mackerras 提交于
      Vince Weaver reported a 'perf stat' measurement overhead in the
      count of retired instructions, which can amount to a +6000
      instructions inflated count in the reported count.
      
      At present, perf stat creates its counters on the perf process.  Thus
      the counters count the fork and various other activity in both the
      parent and child, such as the resolver overhead for resolving PLT
      entries for any libc functions that haven't been called before, such
      as execvp.
      
      This reduces the overhead by creating the counters on the child process
      after the fork, using a couple of pipes to synchronize so that the
      child process waits until the parent has created the counters before
      doing the exec.  To eliminate the PLT resolution overhead on calling
      execvp, this does a dummy execvp first which will always fail.
      
      With this, the overhead of executing a program goes down from over
      4800 instructions to about 90 instructions on powerpc (32-bit).
      This was measured with a statically-linked program written in
      assembler which only does the 3 instructions needed to call _exit(0).
      
      Before:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                 4858  instructions
      
          0.001274523  seconds time elapsed
      
      After:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                   92  instructions
      
          0.000468153  seconds time elapsed
      Reported-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      051ae7f7
    • I
      perf stat: Use percentages for scaling output · 210ad39f
      Ingo Molnar 提交于
      Peter expressed a strong preference for percentage based
      display of scaled values - so revert to that from the
      recently introduced multiplication-factor unit.
      Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      210ad39f
  5. 28 6月, 2009 2 次提交
    • J
      perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run · c3043569
      Jaswinder Singh Rajput 提交于
      Set attrs and nr_counters if no event is selected and !null_run.
      
      Setting of attrs should depend on number of counters,
      so we need to memcpy only for sizeof(default_attrs)
      
      Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
      hardcoded value.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246126749.32198.16.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c3043569
    • J
      perf stat: Improve output · 6e750a8f
      Jaswinder Singh Rajput 提交于
      Increase size for event name to handle bigger names like
      'L1-d$-prefetch-misses'
      
      Changed scaled counters from percentage to a multiplicative
      factor because the latter is more expressive.
      
      Also aligned the scaling factor, otherwise sometimes it looks
      like:
      
                  384  iTLB-load-misses           (4.74x scaled)
               452029  branch-loads               (8.00x scaled)
                 5892  branch-load-misses         (20.39x scaled)
               972315  iTLB-loads                 (3.24x scaled)
      
      Before:
               150708  L1-d$-stores          (scaled from 23.57%)
               428804  L1-d$-prefetches      (scaled from 23.47%)
               314446  L1-d$-prefetch-misses  (scaled from 23.42%)
            252626137  L1-i$-loads           (scaled from 23.24%)
              5297550  dTLB-load-misses      (scaled from 23.96%)
            106992392  branch-loads          (scaled from 23.67%)
              5239561  branch-load-misses    (scaled from 23.43%)
      
      After:
              1731713  L1-d$-loads               (  14.25x scaled)
                44241  L1-d$-prefetches          (   3.88x scaled)
                21076  L1-d$-prefetch-misses     (   3.40x scaled)
              5789421  L1-i$-loads               (   3.78x scaled)
                29645  dTLB-load-misses          (   2.95x scaled)
               461474  branch-loads              (   6.52x scaled)
                 7493  branch-load-misses        (  26.57x scaled)
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e750a8f
  6. 27 6月, 2009 3 次提交
    • I
      perf stat: Fix multi-run stats · 566747e6
      Ingo Molnar 提交于
      In multi-run (-r/--repeat) printouts, print out the noise of
      the wall-clock average as well.
      
      Also, fix a bug in printing out scaled counters: if it was not
      scaled then we should not update the average with -1.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      566747e6
    • I
      perf stat: Add -n/--null option to run without counters · 0cfb7a13
      Ingo Molnar 提交于
      Allow a no-counters run. This can be useful to measure just
      elapsed wall-clock time - or to assess the raw overhead of perf
      stat itself, without running any counters.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cfb7a13
    • I
      perf_counter tools: Remove dead code · fde953c1
      Ingo Molnar 提交于
      Vince Weaver reported that there's a handful of #ifdef __MINGW32__
      sections in the code.
      
      Remove them as they are in essence dead code - as unlike upstream
      Git, the perf tool is unlikely to be ported to Windows.
      Reported-by: NVince Weaver <vince@deater.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fde953c1
  7. 26 6月, 2009 4 次提交
    • F
      perf report: Print sorted callchains per histogram entries · f55c5552
      Frederic Weisbecker 提交于
      Use the newly created callchains radix tree to gather the chains stats
      from the recorded events and then print the callchains for all of them,
      sorted by hits, using the "-c" parameter with perf report.
      
      Example:
      
       66.15%  [k] atm_clip_exit
                  63.08%
                      0xffffffffffffff80
                      0xffffffff810196a8
                      0xffffffff810c14c8
                      0xffffffff8101a79c
                      0xffffffff810194f3
                      0xffffffff8106ab7f
                      0xffffffff8106abe5
                      0xffffffff8106acde
                      0xffffffff8100d94b
                      0xffffffff8153e7ea
                      [...]
      
                   1.54%
                      0xffffffffffffff80
                      0xffffffff810196a8
                      0xffffffff810c14c8
                      0xffffffff8101a79c
      		[...]
      
      Symbols are not yet resolved.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246026481-8314-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f55c5552
    • F
      perf_counter tools: Prepare a small callchain framework · 8cb76d99
      Frederic Weisbecker 提交于
      We plan to display the callchains depending on some user-configurable
      parameters.
      
      To gather the callchains stats from the recorded stream in a fast way,
      this patch introduces an ad hoc radix tree adapted for callchains and also
      a rbtree to sort these callchains once we have gathered every events
      from the stream.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246026481-8314-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8cb76d99
    • F
      perf record: Fix unhandled io return value · 3928ddbe
      Frederic Weisbecker 提交于
      Building latest perfcounter fails on the following error:
      
       builtin-record.c: In function ‘create_counter’:
       builtin-record.c:451: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
       make: *** [builtin-record.o] Erreur 1
      
      Just check if we successfully read the perf file descriptor.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1245961287-5327-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3928ddbe
    • J
      perf_counter tools: Add alias for 'l1d' and 'l1i' · 4418351f
      Jaswinder Singh Rajput 提交于
      Add 'l1d' and 'l1i' aliases again as shortcuts - just dont make them
      the primary display alias.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1245945462.9157.11.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4418351f