1. 23 7月, 2009 5 次提交
    • J
      perf_counter: Add tracepoint support to perf list, perf stat · f6bdafef
      Jason Baron 提交于
      Add support to 'perf list' and 'perf stat' for kernel tracepoints. The
      implementation creates a 'for_each_subsystem' and 'for_each_event' for
      easy iteration over the tracepoints.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <426129bf9fcc8ee63bb094cf736e7316a7dcd77a.1248190728.git.jbaron@redhat.com>
      f6bdafef
    • A
      perf symbol: C++ demangling · 28ac909b
      Arnaldo Carvalho de Melo 提交于
      [acme@doppio ~]$ perf report -s comm,dso,symbol -C firefox -d /usr/lib64/xulrunner-1.9.1/libxul.so | grep :: | head
           2.21%  [.] nsDeque::Push(void*)
           1.78%  [.] GraphWalker::DoWalk(nsDeque&)
           1.30%  [.] GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*)
           1.27%  [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
           1.18%  [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
           1.13%  [.] nsDeque::PopFront()
           1.11%  [.] nsGlobalWindow::RunTimeout(nsTimeout*)
           0.97%  [.] nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)
           0.95%  [.] nsJSEventListener::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
           0.95%  [.] nsCOMPtr_base::~nsCOMPtr_base()
      [acme@doppio ~]$
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Suggested-by: NClark Williams <williams@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090720171412.GB10410@ghostprotocols.net>
      28ac909b
    • A
      perf: avoid structure size confusion by using a fixed size · dfe5a504
      Arjan van de Ven 提交于
      for some reason, this structure gets compiled as 36 bytes in some files
      (the ones that alloacte it) but 40 bytes in others (the ones that use it).
      The cause is an off_t type that gets a different size in different
      compilation units for some yet-to-be-explained reason.
      
      But the effect is disasterous; the size/offset members of the struct
      are at different offsets, and result in mostly complete garbage.
      The parser in perf is so robust that this all gets hidden, and after
      skipping an certain amount of samples, it recovers.... so this bug
      is not normally noticed.
      
      .... except when you want every sample to be exact.
      
      Fix this by just using an explicitly sized type.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4A655917.9080504@linux.intel.com>
      dfe5a504
    • A
      perf_counter: Improve perf stat and perf record option parsing · a0541234
      Anton Blanchard 提交于
      perf stat and perf record currently look for all options on the command
      line. This can lead to some confusion:
      
      # perf stat ls -l
        Error: unknown switch `l'
      
      While we can work around this by adding '--' before the command, the git
      option parsing code can stop at the first non option:
      
      # perf stat ls -l
       Performance counter stats for 'ls -l':
      ....
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090722130412.GD9029@kryten>
      a0541234
    • P
      perf_counter: PERF_SAMPLE_ID and inherited counters · 7f453c24
      Peter Zijlstra 提交于
      Anton noted that for inherited counters the counter-id as provided by
      PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
      because each inherited counter gets its own id.
      
      His suggestion was to always return the parent counter id, since that
      is the primary counter id as exposed. However, these inherited
      counters have a unique identifier so that events like
      PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
      counter gets modified, which is important when trying to normalize the
      sample streams.
      
      This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
      which is more useful anyway, since changing periods became a lot more
      common than initially thought -- rendering PERF_EVENT_PERIOD the less
      useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
      value, since it reports the value used to trigger the overflow,
      whereas PERF_EVENT_PERIOD simply reports the requested period changed,
      which might only take effect on the next cycle).
      
      This still leaves us PERF_EVENT_THROTTLE to consider, but since that
      _should_ be a rare occurrence, and linking it to a primary id is the
      most useful bit to diagnose the problem, we introduce a
      PERF_SAMPLE_STREAM_ID, for those few cases where the full
      reconstruction is important.
      
      [Does change the ABI a little, but I see no other way out]
      Suggested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248095846.15751.8781.camel@twins>
      7f453c24
  2. 18 7月, 2009 3 次提交
  3. 13 7月, 2009 1 次提交
  4. 12 7月, 2009 5 次提交
    • A
      perf report: Introduce -n/--show-nr-samples · e3d7e183
      Arnaldo Carvalho de Melo 提交于
      [acme@doppio pahole]$ perf report -ns comm,dso,symbol -d /lib64/libc-2.10.1.so -C pahole | head -17
          21.94%      32101  [.] _int_malloc
          20.10%      29402  [.] __GI_strcmp
          16.77%      24533  [.] __tsearch
          12.61%      18450  [.] malloc_consolidate
           6.42%       9394  [.] _int_free
           6.28%       9191  [.] __tfind
           4.56%       6678  [.] __GI___libc_free
           4.46%       6520  [.] _IO_vfprintf_internal
           2.59%       3786  [.] __malloc
           1.17%       1716  [.] __GI_memcpy
      [acme@doppio pahole]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-5-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e3d7e183
    • A
      perf_counter tools: PLT info is stripped in -debuginfo packages · a25e46c4
      Arnaldo Carvalho de Melo 提交于
      So we need to get the richer .symtab from the debuginfo
      packages but the PLT info from the original DSO where we have
      just the leaner .dynsym symtab.
      
      Example:
      
      | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > before
      | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > after
      | [acme@doppio pahole]$ diff -U1 before after
      | --- before	2009-07-11 11:04:22.688595741 -0300
      | +++ after	2009-07-11 11:04:33.380595676 -0300
      | @@ -80,3 +80,2 @@
      |       0.07%  pahole ./build/pahole              [.] pahole_stealer
      | -     0.06%  pahole /usr/lib64/libdw-0.141.so   [.] 0x00000000007140
      |       0.06%  pahole /usr/lib64/libdw-0.141.so   [.] __libdw_getabbrev
      | @@ -91,2 +90,3 @@
      |       0.06%  pahole [kernel]                    [k] free_hot_cold_page
      | +     0.06%  pahole /usr/lib64/libdw-0.141.so   [.] tfind@plt
      |       0.05%  pahole ./build/libdwarves.so.1.0.0 [.] ftype__add_parameter
      | @@ -242,2 +242,3 @@
      |       0.01%  pahole [kernel]                    [k] account_group_user_time
      | +     0.01%  pahole /usr/lib64/libdw-0.141.so   [.] strlen@plt
      |       0.01%  pahole ./build/pahole              [.] strcmp@plt
      | [acme@doppio pahole]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-4-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a25e46c4
    • A
      perf report: Make the output more compact · 021191b3
      Arnaldo Carvalho de Melo 提交于
      When we filter by column content we may end up with a column
      that has the same value for all the lines. So remove that
      column and tell its unique value on the top, as a comment.
      
      Example:
      
        [acme@doppio pahole]$  perf report --sort comm,dso,symbol -d ./build/libdwarves.so.1.0.0 -C pahole | head -15
        # dso: ./build/libdwarves.so.1.0.0
        # comm: pahole
        # Samples: 58409
        #
        # Overhead  Symbol
        # ........  ......
        #
            20.93%  [.] tag__recode_dwarf_type
            14.94%  [.] namespace__recode_dwarf_types
            10.38%  [.] cu__table_add_tag
             6.69%  [.] __die__process_tag
             5.05%  [.] die__process_function
             4.70%  [.] list__for_all_tags
             3.68%  [.] tag__init
             3.48%  [.] die__create_new_parameter
        [acme@doppio pahole]$
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-3-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      021191b3
    • A
      strlist: Introduce strlist__entry and strlist__nr_entries methods · 27d0fd41
      Arnaldo Carvalho de Melo 提交于
      The strlist__entry method allows accessing strlists like an
      array, will be used in the 'perf report' to access the first
      entry.
      
      We now keep the nr_entries so that we can check if we have just
      one entry, will be used in 'perf report' to improve the output
      by showing just at the top when we have just, say, one DSO.
      
      While at it use nr_entries to optimize strlist__is_empty by not
      using the far more costly rb_first based implementation.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-2-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      27d0fd41
    • A
      perf report: Tidy up reporting of symbols not found · 60c1baf1
      Arnaldo Carvalho de Melo 提交于
      Always printing the level info about if it is in the kernel,
      hypervisor or userspace as that is in the hist_entry.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1247325517-12272-1-git-send-email-acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60c1baf1
  5. 11 7月, 2009 1 次提交
    • A
      perf report: Adjust column width to the values sampled · 52d422de
      Arnaldo Carvalho de Melo 提交于
      Auto-adjust column width of perf report output to the
      longest occuring string length.
      
      Example:
      
      [acme@doppio pahole]$  perf report --sort comm,dso,symbol | head -13
      
          12.79%   pahole  /usr/lib64/libdw-0.141.so    [.] __libdw_find_attr
           8.90%   pahole  /lib64/libc-2.10.1.so        [.] _int_malloc
           8.68%   pahole  /usr/lib64/libdw-0.141.so    [.] __libdw_form_val_len
           8.15%   pahole  /lib64/libc-2.10.1.so        [.] __GI_strcmp
           6.80%   pahole  /lib64/libc-2.10.1.so        [.] __tsearch
           5.54%   pahole  ./build/libdwarves.so.1.0.0  [.] tag__recode_dwarf_type
      [acme@doppio pahole]$
      
      [acme@doppio pahole]$  perf report --sort comm,dso,symbol -d /lib64/libc-2.10.1.so | head -10
      
          21.92%   pahole  /lib64/libc-2.10.1.so  [.] _int_malloc
          20.08%   pahole  /lib64/libc-2.10.1.so  [.] __GI_strcmp
          16.75%   pahole  /lib64/libc-2.10.1.so  [.] __tsearch
      [acme@doppio pahole]$
      
      Also add these extra options to control the new behaviour:
      
        -w, --field-width
      
      Force each column width to the provided list, for large terminal
      readability.
      
        -t, --field-separator:
      
      Use a special separator character and don't pad with spaces, replacing
      all occurances of this separator in symbol names (and other output) with
      a '.' character, that thus it's the only non valid separator.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090711014728.GH3452@ghostprotocols.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      52d422de
  6. 10 7月, 2009 2 次提交
    • V
      perf_counter: Add P6 PMU support · 11d1578f
      Vince Weaver 提交于
      Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
      enable/disable both its counters. We use this for the
      global enable/disable, and clear all config bits (except EN)
      to disable individual counters.
      
      Actual ia32 hardware doesn't support lfence, so use a locked
      op without side-effect to implement a full barrier.
      
      perf stat and perf record seem to function correctly.
      
      [a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]
      Signed-off-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      11d1578f
    • A
      perf_counter tools: Rename cache events to remove $ · 9590b7ba
      Anton Blanchard 提交于
      The cache events contain '$' which will hit shell variable
      expansion. To avoid confusion change this to 'cache', ie
      L1-d$-loads becomes L1-dcache-loads.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20090706120131.GB4391@kryten>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9590b7ba
  7. 05 7月, 2009 5 次提交
    • F
      perf report: Add "Fractal" mode output - support callchains with relative overhead rate · 805d127d
      Frederic Weisbecker 提交于
      The current callchain displays the overhead rates as absolute:
      relative to the total overhead.
      
      This patch provides relative overhead percentage, in which each
      branch of the callchain tree is a independant instrumentated object.
      
      This provides a 'fractal' view of the call-chain profile: each
      sub-graph looks like a profile in itself - relative to its parent.
      
      You can produce such output by using the "fractal" mode
      that you can abbreviate via f, fr, fra, frac, etc...
      
      ./perf report -s sym -c fractal
      
      Example:
      
           8.46%  [k] copy_user_generic_string
                      |
                      |--52.01%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |          |--97.20%-- sys_pread64
                      |          |          system_call_fastpath
                      |          |          pread64
                      |          |
                      |           --2.81%-- sys_read
                      |                     system_call_fastpath
                      |                     __read
                      |
                      |--39.85%-- generic_file_buffered_write
                      |          __generic_file_aio_write_nolock
                      |          generic_file_aio_write
                      |          do_sync_write
                      |          reiserfs_file_write
                      |          vfs_write
                      |          |
                      |          |--97.05%-- sys_pwrite64
                      |          |          system_call_fastpath
                      |          |          __pwrite64
                      |          |
                      |           --2.95%-- sys_write
                      |                     system_call_fastpath
                      |                     __write_nocancel
      [...]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      805d127d
    • F
      perf_counter tools: callchains: Manage the cumul hits on the fly · e05b876c
      Frederic Weisbecker 提交于
      The cumul hits are the number of hits of every childs of a node
      plus the hits of the current nodes, required for percentage
      computing of a branch.
      
      Theses numbers are calculated during the sorting of the branches of
      the callchain tree using a depth first postfix traversal, so that
      cumulative hits are propagated in the right order.
      
      But if we plan to implement percentages relative to the parent and not
      absolute percentages (relative to the whole overhead), we need to know
      the cumulative hits of the parent before computing the children
      because the relative minimum acceptable number of entries (ie: minimum
      rate against the cumulative hits from the parent) is the basis to
      filter the children against a given rate.
      
      Then we need to handle the cumul hits on the fly to prepare the
      implementation of relative overhead rates.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e05b876c
    • F
      perf report: Change default callchain parameters · 94a8eb02
      Frederic Weisbecker 提交于
      The default callchain parameters are set to use the flat mode and never
      filter any overhead threshold of backtrace.
      
      But flat mode is boring compared to graph mode.
      Also the number of callchains may be very high if none is
      filtered.
      
      Let's change this to set the graph view and a minimum overhead of 0.5%
      as default parameters.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94a8eb02
    • F
      perf report: Use a modifiable string for default callchain options · be903885
      Frederic Weisbecker 提交于
      If the user doesn't provide options to tune his callchain output
      (ie: if he uses -c without arguments) then the default value passed
      in the OPT_CALLBACK_DEFAULT() macro is used.
      
      But it's parsed later by strtok() which will replace comma separators
      to a zero. This may segfault as we are using a read-only string.
      
      Use a modifiable one instead, and also fix the "100%" default
      minimum threshold value by turning it into a 0 (output every callchains)
      as it was intended in the origin.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      be903885
    • F
      perf report: Warn on callchain output request from non-callchain file · 91b4eaea
      Frederic Weisbecker 提交于
      perf report segfaults while trying to handle callchains from a non
      callchain data file.
      
      Instead of a segfault, print a useful message to the user.
      Reported-by: NJens Axboe <jens.axboe@oracle.com>
      Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      91b4eaea
  8. 03 7月, 2009 8 次提交
    • I
      perf report: Annotate variable initialization · 029e5b16
      Ingo Molnar 提交于
      Certain versions of GCC dont see the initialization that is done here:
      
        builtin-report.c: In function ‘__cmd_report’:
        builtin-report.c:1038: warning: ‘syms’ may be used uninitialized in this function
      
      So annotate it with a NULL initialization.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      029e5b16
    • A
      perf_counter tools: Adjust symbols in ET_EXEC files too · 30d7a77d
      Arnaldo Carvalho de Melo 提交于
      Ingo Molnar wrote:
      
      > i just bisected a 'perf report' bug that would cause us to not
      > resolve all user-space symbols in a 'git gc' run to:
      >
      > f5812a7a is first bad commit
      > commit f5812a7a
      > Author: Arnaldo Carvalho de Melo <acme@redhat.com>
      > Date:   Tue Jun 30 11:43:17 2009 -0300
      >
      >     perf_counter tools: Adjust only prelinked symbol's addresses
      
      Rename ->prelinked to ->adjust_symbols and making what was done
      only for prelinked libraries also to ET_EXEC binaries, such as
      /usr/bin/git:
      
      [acme@doppio pahole]$ readelf -h /usr/bin/git | grep Type
        Type:                              EXEC (Executable file)
      [acme@doppio pahole]$
      
      And after installing the 'git-debuginfo' package, I get correct results:
      
      [acme@doppio linux-2.6-tip]$ perf report --sort comm,dso,symbol -d /usr/bin/git | head -20
      
       #
       # (1139614 samples)
       #
       # Overhead           Command  Shared Object              Symbol
       # ........  ................  .........................  ......
       #
          34.98%               git  /usr/bin/git               [.] send_sideband
          33.39%               git  /usr/bin/git               [.] enter_repo
           6.81%               git  /usr/bin/git               [.] diff_opt_parse
           4.95%               git  /usr/bin/git               [.] is_repository_shallow
           3.24%               git  /usr/bin/git               [.] odb_mkstemp
           1.39%               git  /usr/bin/git               [.] output
           1.34%               git  /usr/bin/git               [.] xmmap
           1.25%               git  /usr/bin/git               [.] receive_pack_config
           1.16%               git  /usr/bin/git               [.] git_pathdup
           0.90%               git  /usr/bin/git               [.] read_object_with_reference
           0.86%               git  /usr/bin/git               [.] show_patch_diff
           0.85%               git  /usr/bin/git               0x00000000095e2e
           0.69%               git  /usr/bin/git               [.] display
      [acme@doppio linux-2.6-tip]$
      
      I'll check what are the last cases where we can't resolve symbols, like
      this 0x00000000095e2e later.
      
      And I guess this will fix the problems Mike were seeing too:
      
       [acme@doppio linux-2.6-tip]$ readelf -h ../build/perf/vmlinux | grep Type
         Type:                              EXEC (Executable file)
       [acme@doppio linux-2.6-tip]$
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      30d7a77d
    • F
      perf_counter tools: Display percents of hits in callchain with overhead colors · 24b57c69
      Frederic Weisbecker 提交于
      This adds the use of colors to signal at a glance the important
      overhead thresholds in callchains hit rates.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246558475-10624-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      24b57c69
    • F
      perf_counter tools: Provide helper to print percents color · 1e11fd82
      Frederic Weisbecker 提交于
      Among perf annotate, perf report and perf top, we can find the
      common colored printing of percents according to the following
      rules:
      
          High overhead =  > 5%, colored in red
          Mid overhead =  > 0.5%, colored in green
          Low overhead =  < 0.5%, default color
      
      Factorize these multiple checks in a single function named
      percent_color_fprintf() and also provide a get_percent_color()
      for sites which print percentages and other things at the same
      time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1e11fd82
    • F
      perf_counter tools: Set the minimum percent for callchains to be displayed · c20ab37e
      Frederic Weisbecker 提交于
      Callchains output may become a burden on a trace because even
      rarely hit site are exposed. This can be too much information.
      
      Let the user set a threshold as a minimum percent of hits using
      the new pattern for the -c option:
      
          -c mode,min_percent
      
      Example:
      
      $ perf report -s sym -c flat,4
      
           8.25%  [k] copy_user_generic_string
                   4.19%
                      copy_user_generic_string
                      generic_file_aio_read
                      do_sync_read
                      vfs_read
                      sys_pread64
                      system_call_fastpath
                      pread64
      
           5.39%  [k] search_by_key
           4.63%  0x00000000009e0a
           2.36%  [k] memcpy_c
      [...]
      
      $ perf report -s sym -c graph,2
      
           8.25%  [k] copy_user_generic_string
                      |
                      |--4.31%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |           --4.19%-- sys_pread64
                      |                     system_call_fastpath
                      |                     pread64
                      |
                       --3.24%-- generic_file_buffered_write
                                 __generic_file_aio_write_nolock
                                 generic_file_aio_write
                                 do_sync_write
                                 reiserfs_file_write
                                 vfs_write
                                 |
                                  --3.14%-- sys_pwrite64
                                            system_call_fastpath
                                            __pwrite64
      
           5.39%  [k] search_by_key
                      |
                       --2.23%-- reiserfs_update_sd_size
      
           4.63%  0x00000000009e0a
      
           2.36%  [k] memcpy_c
      [...]
      
      You can also omit it and it will default to 0.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c20ab37e
    • F
      perf report: Add support for callchain graph output · 4eb3e478
      Frederic Weisbecker 提交于
      Currently, the printing of callchains is done in a single
      vertical level, this is the "flat" mode:
      
      8.25%  [k] copy_user_generic_string
                   4.19%
                      copy_user_generic_string
                      generic_file_aio_read
                      do_sync_read
                      vfs_read
                      sys_pread64
                      system_call_fastpath
                      pread64
      
      This patch introduces a new "graph" mode which provides a
      hierarchical output of factorized paths recursively sorted:
      
       8.25%  [k] copy_user_generic_string
                      |
                      |--4.31%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |          |--4.19%-- sys_pread64
                      |          |          system_call_fastpath
                      |          |          pread64
                      |          |
                      |           --0.12%-- sys_read
                      |                     system_call_fastpath
                      |                     __read
                      |
                      |--3.24%-- generic_file_buffered_write
                      |          __generic_file_aio_write_nolock
                      |          generic_file_aio_write
                      |          do_sync_write
                      |          reiserfs_file_write
                      |          vfs_write
                      |          |
                      |          |--3.14%-- sys_pwrite64
                      |          |          system_call_fastpath
                      |          |          __pwrite64
                      |          |
                      |           --0.10%-- sys_write
      [...]
      
      The command line has then changed.
      
      By providing the -c option, the callchain will output in the
      flat mode by default.
      
      But you can override it:
      
          perf report -c graph
      
      or
      
          perf report -c flat
      
      You can also pass the abreviated mode:
      
          perf report -c g
      
      or
      
          perf report -c gra
      
      will both make use of the graph mode.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4eb3e478
    • F
      perf_counter tools: Add new OPT_CALLBACK_DEFAULT option · 5a4b1817
      Frederic Weisbecker 提交于
      There is no predefined macro to create an option that can have
      a custom value or a default one if none is given.
      
      This patch provides a new helper OPT_CALLBACK_DEFAULT() which
      defines such kind of option.
      
      For example, considering an option -c, we want to get the
      default value in the following cases:
      
          perf command -c -d
          perf command -d -c
      
      And the foo value when it's given:
      
          perf command -c foo -d
          perf command -d -c foo
      
      That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to
      support default values whatever the position of the option, not
      only in the end.
      
      Should it now be renamed to PARSE_OPT_ARG_DEFAULT ?
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: git@vger.kernel.org
      LKML-Reference: <1246550301-8954-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5a4b1817
    • F
      perf_counter tools: Create new chain_for_each_child() iterator · 14f4654c
      Frederic Weisbecker 提交于
      Iterating through children of a node in the callchain tree
      shows something that may be quite confusing at a first glance.
      The head is the children field of the parent and the list nodes
      are in the brothers field of the children.
      
      This is because the childs are linked to the parent as a list
      of "brothers" using the "children" list of the parent as a
      head:
      
        ---------------
       | Parent (head) |-------------------------------------
        ---------------                                      |
           |                                                 |
        children                                             |
           |                                                 |
        -----------               -----------                |
       | 1st child |---brother---| 2nd child |---brother-----
        -----------               -----------
      
      This makes the following strange pattern often occuring:
      
       list_for_each_entry(child, &parent->children, brothers) {
              // do something with children
       }
      
      Abstract it to chain_for_each_child() to factorize and simplify
      this pattern.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      14f4654c
  9. 02 7月, 2009 7 次提交
  10. 01 7月, 2009 3 次提交
    • J
      perf list: Add cache events · 73c24cb8
      Jaswinder Singh Rajput 提交于
      After:
      
      $ ./perf list
      
      List of pre-defined events (to be used in -e):
      
        cpu-cycles OR cycles                     [Hardware event]
        instructions                             [Hardware event]
        cache-references                         [Hardware event]
        cache-misses                             [Hardware event]
        branch-instructions OR branches          [Hardware event]
        branch-misses                            [Hardware event]
        bus-cycles                               [Hardware event]
      
        cpu-clock                                [Software event]
        task-clock                               [Software event]
        page-faults OR faults                    [Software event]
        minor-faults                             [Software event]
        major-faults                             [Software event]
        context-switches OR cs                   [Software event]
        cpu-migrations OR migrations             [Software event]
      
        L1-d$-loads                              [Hardware cache event]
        L1-d$-load-misses                        [Hardware cache event]
        L1-d$-stores                             [Hardware cache event]
        L1-d$-store-misses                       [Hardware cache event]
        L1-d$-prefetches                         [Hardware cache event]
        L1-d$-prefetch-misses                    [Hardware cache event]
        L1-i$-loads                              [Hardware cache event]
        L1-i$-load-misses                        [Hardware cache event]
        L1-i$-prefetches                         [Hardware cache event]
        L1-i$-prefetch-misses                    [Hardware cache event]
        LLC-loads                                [Hardware cache event]
        LLC-load-misses                          [Hardware cache event]
        LLC-stores                               [Hardware cache event]
        LLC-store-misses                         [Hardware cache event]
        LLC-prefetches                           [Hardware cache event]
        LLC-prefetch-misses                      [Hardware cache event]
        dTLB-loads                               [Hardware cache event]
        dTLB-load-misses                         [Hardware cache event]
        dTLB-stores                              [Hardware cache event]
        dTLB-store-misses                        [Hardware cache event]
        dTLB-prefetches                          [Hardware cache event]
        dTLB-prefetch-misses                     [Hardware cache event]
        iTLB-loads                               [Hardware cache event]
        iTLB-load-misses                         [Hardware cache event]
        branch-loads                             [Hardware cache event]
        branch-load-misses                       [Hardware cache event]
      
        rNNN                                     [raw hardware event descriptor]
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1246453578.3072.1.camel@ht.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      73c24cb8
    • J
      perf stat: Define MATCH_EVENT for easy attr checking · b9ebdcc0
      Jaswinder Singh Rajput 提交于
      MATCH_EVENT is useful:
      
       1. for multiple attrs checking
       2. avoid repetition of PERF_TYPE_ and PERF_COUNT_ and save space
       3. avoids line breakage
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246440909.3403.5.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b9ebdcc0
    • I
      perf_counter tools: Add more warnings and fix/annotate them · f37a291c
      Ingo Molnar 提交于
      Enable -Wextra. This found a few real bugs plus a number
      of signed/unsigned type mismatches/uncleanlinesses. It
      also required a few annotations
      
      All things considered it was still worth it so lets try with
      this enabled for now.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f37a291c