1. 07 1月, 2012 1 次提交
  2. 16 11月, 2011 1 次提交
    • A
      perf python: Fix undefined symbol problem · 0e2a5f10
      Arnaldo Carvalho de Melo 提交于
      Recently we made perf_evsel__init call hists__init, which broke the perf
      python binding:
      
      [root@emilia linux]# ./tools/perf/python/twatch.py
      Traceback (most recent call last):
        File "./tools/perf/python/twatch.py", line 16, in <module>
          import perf
      ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: hists__init
      
      Fix it by moving the hists__init function to its only caller, evsel.c.
      
      This way we avoid dragging in other parts of tools/perf/util/ to the
      perf python binding.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-5nffmdt5mu6ozxgj54oi4qon@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0e2a5f10
  3. 27 10月, 2011 1 次提交
  4. 20 10月, 2011 2 次提交
  5. 19 10月, 2011 3 次提交
  6. 17 10月, 2011 1 次提交
  7. 13 10月, 2011 2 次提交
  8. 08 10月, 2011 2 次提交
    • S
      perf tools: Fix broken number of samples for perf report -n · e39622ce
      Stephane Eranian 提交于
      The perf report -n option was broken because it was not reporting the
      correct number of samples depending on the sorting mode. By default,
      samples are sorted by comm,dso,sym. That means that samples for the same
      command (binary) get collapsed.
      
      The hists__collapse_insert_entry() had a bug whereby it was aggregating
      the number of events observed (periods) but not the number of samples.
      Consequently, the number of samples reported could be below reality. The
      percentage remained correct because based on the periods.
      
      This patch fixes the problem by also aggregating the number of samples.
      Here is an example:
      
      $ perf report -n --stdio
          12.38%        842     pong  [kernel.kallsyms]     [k] __lock_acquire
      
      Here pong (a ctxsw stress test), is the only program running
      and thus it is the only one responsible for the lock_acquire samples.
      
      If we change the sorting mode:
      
      $ perf report -n --stdio --sort=sym
          12.38%       1732  [k] __lock_acquire
      
      The actual number of samples is shown.
      
      With the fix:
      
      $ perf report -n --stdio
          12.38%       1732     pong  [kernel.kallsyms]     [k] __lock_acquire
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20111003093815.GA6393@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e39622ce
    • A
      perf top: Reuse the 'report' hist_entry/hists classes · ab81f3fd
      Arnaldo Carvalho de Melo 提交于
      This actually fixes several problems we had in the old 'perf top':
      
      1. Unresolved symbols not show, limitation that came from the old
         "KernelTop" codebase, to solve it we would need to do changes
         that would make sym_entry have most of the hist_entry fields.
      2. It was using the number of samples, not the sum of sample->period.
      
      And brings the --sort code that allows us to have all the views in
      'perf report', for instance:
      
      [root@emilia ~]# perf top --sort dso
      PerfTop: 5903 irqs/sec kernel:77.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs)
      ------------------------------------------------------------------------------
      
          31.59%  libcrypto.so.1.0.0
          21.55%  [kernel]
          18.57%  libpython2.6.so.1.0
           7.04%  libc-2.12.so
           6.99%  _backend_agg.so
           4.72%  sshd
           1.48%  multiarray.so
           1.39%  libfreetype.so.6.3.22
           1.37%  perf
           0.71%  libgobject-2.0.so.0.2200.5
           0.53%  [tg3]
           0.48%  libglib-2.0.so.0.2200.5
           0.44%  libstdc++.so.6.0.13
           0.40%  libcairo.so.2.10800.8
           0.38%  libm-2.12.so
           0.34%  umath.so
           0.30%  libgdk-x11-2.0.so.0.1800.9
           0.22%  libpthread-2.12.so
           0.20%  libgtk-x11-2.0.so.0.1800.9
           0.20%  librt-2.12.so
           0.15%  _path.so
           0.13%  libpango-1.0.so.0.2800.1
           0.11%  libatlas.so.3.0
           0.09%  ft2font.so
           0.09%  libpangoft2-1.0.so.0.2800.1
           0.08%  libX11.so.6.3.0
           0.07%  [vdso]
           0.06%  cyclictest
      ^C
      
      All the filter lists can be used as well: --dsos, --comms, --symbols,
      etc.
      
      The 'perf report' TUI is also reused, being possible to apply all the
      zoom operations, do annotation, etc.
      
      This change will allow multiple simplifications in the symbol system as
      well, that will be detailed in upcoming changesets.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-xzaaldxq7zhqrrxdxjifk1mh@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab81f3fd
  9. 07 10月, 2011 4 次提交
  10. 30 6月, 2011 2 次提交
    • F
      perf tools: Don't display ignored entries on stdio ui · e84d2122
      Frederic Weisbecker 提交于
      As for newt ui, don't display entries that have been marked
      as ignored.
      
      The practical current effect of this is to make parent
      filtering really working. Before, entries that were ignored
      were given a null parent but were still displayed. This
      resulted in some weird effects:
      
       # Overhead      Command      Shared Object        Symbol
       # ........  ...........  .................  ............
       #
      ^A
                         |
                         --- __lock_acquire
                            |
                            |--95.97%-- lock_acquire
                            |          |
                            |          |--30.75%-- _raw_spin_lock
      
      Discard these from the stdio display.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      e84d2122
    • S
      perf tools: Add inverted call graph report support. · d797fdc5
      Sam Liao 提交于
      Add "caller/callee" option to support inverted butterfly report,
      in the inverted report (with caller option), the call graph start
      from the callee's ancestor. Users can use such view to catch system's
      performance bottleneck from a sysprof like view. Using this option
      with specified sort order like pid gives us high level view of call
      graph statistics.
      
      Also add "-G" alias for inverted call graph.
      Signed-off-by: NSam Liao <phyomh@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d797fdc5
  11. 07 3月, 2011 1 次提交
    • A
      perf tools: Improve support for sessions with multiple events · e248de33
      Arnaldo Carvalho de Melo 提交于
      By creating an perf_evlist out of the attributes in the perf.data file
      header, so that we can use evlists and evsels when reading recorded
      sessions in addition to when we record sessions.
      
      More work is needed to allow tools to allow the user to select which
      events are wanted when browsing sessions, be it just one or a subset of
      them, aggregated or showed at the same time but with different
      indications on the UI to allow seeing workloads thru different views at
      the same time.
      
      But the overall goal/trend is to more uniformly use evsels and evlists.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e248de33
  12. 06 3月, 2011 1 次提交
    • A
      perf hists: Remove needless global col lenght calcs · d7603d51
      Arnaldo Carvalho de Melo 提交于
      To support multiple events we need to do these calcs per 'struct hists'
      instance, and it turns out we already do that at:
      
      	__hists__add_entry
      		hists__inc_nr_entries
      			hists__calc_col_len
      
      for all the unfiltered hist_entry instances we stash in the rb tree, so
      trow away the dead code.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7603d51
  13. 25 2月, 2011 1 次提交
    • A
      perf hists: Print number of samples, not the period sum · 69cf0218
      Arnaldo Carvalho de Melo 提交于
      So that we match the header where we state the number of events with the
      "Samples" column when using 'perf report -n/--show-nr-samples':
      
       [root@emilia ~]# perf record -a sleep 1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.111 MB perf.data (~4860 samples) ]
       [root@emilia ~]# perf report --stdio --show-nr-samples
       # Events: 11  cycles
       #
       # Overhead  Samples        Command       Shared Object                        Symbol
       # ........ ..........  ...........  ..................  ............................
       #
           16.65%          1        sleep  [kernel.kallsyms]   [k] unmap_vmas
           16.10%          1         perf  libpthread-2.12.so  [.] __pthread_cleanup_push_defer
           15.79%          2         perf  [kernel.kallsyms]   [k] format_decode
           12.88%          1  kworker/1:2  [kernel.kallsyms]   [k] cache_reap
           10.69%          1      swapper  [kernel.kallsyms]   [k] _raw_spin_lock
            7.55%          1        sleep  [kernel.kallsyms]   [k] prepare_exec_creds
            6.00%          1         perf  [jbd2]              [k] start_this_handle
            5.29%          1         perf  [kernel.kallsyms]   [k] seq_read
            4.75%          1         perf  [kernel.kallsyms]   [k] get_pid_task
            4.30%          1         perf  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
      
       #
       # (For a higher level overview, try: perf report --sort comm,dso)
       #
       [root@emilia ~]#
      Reported-by: NStephane Eranian <eranian@google.com>
      Reported-by: NCliff Wickman <cpw@sgi.com>
      Acked-by: NStephane Eranian <eranian@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: <stable@kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      [ cherry-picked it from perf/core, as it has been reported by others as well. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      69cf0218
  14. 17 2月, 2011 1 次提交
    • A
      perf hists: Print number of samples, not the period sum · fec9cbd1
      Arnaldo Carvalho de Melo 提交于
      So that we match the header where we state the number of events with the
      "Samples" column when using 'perf report -n/--show-nr-samples':
      
       [root@emilia ~]# perf record -a sleep 1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.111 MB perf.data (~4860 samples) ]
       [root@emilia ~]# perf report --stdio --show-nr-samples
       # Events: 11  cycles
       #
       # Overhead  Samples        Command       Shared Object                        Symbol
       # ........ ..........  ...........  ..................  ............................
       #
           16.65%          1        sleep  [kernel.kallsyms]   [k] unmap_vmas
           16.10%          1         perf  libpthread-2.12.so  [.] __pthread_cleanup_push_defer
           15.79%          2         perf  [kernel.kallsyms]   [k] format_decode
           12.88%          1  kworker/1:2  [kernel.kallsyms]   [k] cache_reap
           10.69%          1      swapper  [kernel.kallsyms]   [k] _raw_spin_lock
            7.55%          1        sleep  [kernel.kallsyms]   [k] prepare_exec_creds
            6.00%          1         perf  [jbd2]              [k] start_this_handle
            5.29%          1         perf  [kernel.kallsyms]   [k] seq_read
            4.75%          1         perf  [kernel.kallsyms]   [k] get_pid_task
            4.30%          1         perf  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
      
       #
       # (For a higher level overview, try: perf report --sort comm,dso)
       #
       [root@emilia ~]#
      Reported-by: NStephane Eranian <eranian@google.com>
      Acked-by: NStephane Eranian <eranian@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fec9cbd1
  15. 09 2月, 2011 1 次提交
    • A
      perf annotate: Move locking to struct annotation · ce6f4fab
      Arnaldo Carvalho de Melo 提交于
      Since we'll need it when implementing the live annotate TUI browser.
      
      This also simplifies things a bit by having the list head for the source
      code to be in the dynamicly allocated part of struct annotation, that
      way we don't have to pass it around, it can be found from the struct
      symbol that is passed everywhere.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ce6f4fab
  16. 05 2月, 2011 2 次提交
    • A
      perf annotate: Support multiple histograms in annotation · 2f525d01
      Arnaldo Carvalho de Melo 提交于
      The perf annotate tool continues aggregating everything on just one
      histograms, but to support the top model add support for one histogram
      perf evsel in the evlist.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2f525d01
    • A
      perf annotate: Move annotate functions to util/ · 78f7defe
      Arnaldo Carvalho de Melo 提交于
      They will be used by perf top, so that we have just one set of routines
      to do annotation.
      
      Rename "struct sym_priv" to "struct annotation", etc, to clarify this
      code a bit.
      
      Rename "struct sym_ext" to "struct source_line", to give it a meaningful
      name, that clarifies that it is a the result of an addr2line call, that
      is sorted by percentage one particular source code line appeared in the
      annotation.
      
      And since we're moving things around also rename 'sym_hist->ip' to
      'sym_hist->addr' as we want to do data structure annotation at some
      point.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      78f7defe
  17. 30 1月, 2011 1 次提交
  18. 23 1月, 2011 3 次提交
    • F
      perf callchain: Rename cumul_hits into callchain_cumul_hits · f08c3154
      Frederic Weisbecker 提交于
      That makes the callchain API naming more consistent and
      reduce potential naming clashes.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294977121-5700-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f08c3154
    • F
      perf callchain: Feed callchains into a cursor · 1b3a0e95
      Frederic Weisbecker 提交于
      The callchains are fed with an array of a fixed size.
      As a result we iterate over each callchains three times:
      
      - 1st to resolve symbols
      - 2nd to filter out context boundaries
      - 3rd for the insertion into the tree
      
      This also involves some pairs of memory allocation/deallocation
      everytime we insert a callchain, for the filtered out array of
      addresses and for the array of symbols that comes along.
      
      Instead, feed the callchains through a linked list with persistent
      allocations. It brings several pros like:
      
      - Merge the 1st and 2nd iterations in one. That was possible before
      but in a way that would involve allocating an array slightly taller
      than necessary because we don't know in advance the number of context
      boundaries to filter out.
      
      - Much lesser allocations/deallocations. The linked list keeps
      persistent empty entries for the next usages and is extendable at
      will.
      
      - Makes it easier for multiple sources of callchains to feed a
      stacktrace together. This is deemed to pave the way for cfi based
      callchains wherein traditional frame pointer based kernel
      stacktraces will precede cfi based user ones, producing an overall
      callchain which size is hardly predictable. This requirement
      makes the static array obsolete and makes a linked list based
      iterator a much more flexible fit.
      
      Basic testing on a big perf file containing callchains (~ 176 MB)
      has shown a throughput gain of about 11% with perf report.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294977121-5700-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b3a0e95
    • A
      perf tools: Fix 64 bit integer format strings · 9486aa38
      Arnaldo Carvalho de Melo 提交于
      Using %L[uxd] has issues in some architectures, like on ppc64.  Fix it
      by making our 64 bit integers typedefs of stdint.h types and using
      PRI[ux]64 like, for instance, git does.
      
      Reported by Denis Kirjanov that provided a patch for one case, I went
      and changed all cases.
      Reported-by: NDenis Kirjanov <dkirjanov@kernel.org>
      Tested-by: NDenis Kirjanov <dkirjanov@kernel.org>
      LKML-Reference: <20110120093246.GA8031@hera.kernel.org>
      Cc: Denis Kirjanov <dkirjanov@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pingtian Han <phan@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9486aa38
  19. 03 1月, 2011 1 次提交
    • F
      perf: Fix callchain hit bad cast on ascii display · d425de54
      Frederic Weisbecker 提交于
      ipchain__fprintf_graph() casts the number of hits in a branch as an
      int, which means we lose its highests bits.
      
      This results in meaningless number of callchain hits in perf.data
      that have a high number of hits recorded, typically those that have
      callchain branches hits appearing more than INT_MAX. This happens
      easily as those are pondered by the event period.
      Reported-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      d425de54
  20. 22 12月, 2010 1 次提交
    • D
      perf symbols: Add symfs option for off-box analysis using specified tree · ec5761ea
      David Ahern 提交于
      The symfs argument allows analysis of perf.data file using a locally accessible
      filesystem tree with debug symbols - e.g., tree created during image builds,
      sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything
      with an OS tree can be analyzed from anywhere without the need to populate a
      local data store with build-ids.
      
      Commiter notes:
      
      o Fixed up symfs="/" variants handling.
      
      o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files
        outside the symfs directory.
      
      LKML-Reference: <1291926427-28846-1-git-send-email-daahern@cisco.com>
      Signed-off-by: NDavid Ahern <daahern@cisco.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec5761ea
  21. 09 12月, 2010 1 次提交
  22. 23 8月, 2010 2 次提交
    • F
      perf: Support for callchains merge · 612d4fd7
      Frederic Weisbecker 提交于
      If we sort the histograms by comm, which is the default,
      we need to merge some of them, typically different thread
      histograms of a same process, or just same comm. But during
      this merge, we forgot to merge callchains.
      
      So imagine we have three threads (tids: 1000, 1001, 1002) that
      belong to comm "foo".
      
      tid 1000 got 100 events
      tid 1001 got 10 events
      tid 1002 got 3 events
      
      Once we merge these histograms to get a per comm result, we'll
      finally get:
      
      "foo" got 113 events
      
      The problem is if we merge 1000 and 1001 histograms into 1002, then
      the end merge result, wrt callchains, will be only callchains that
      belong to 1002.
      This is because we haven't handled callchains in the merge. Only those
      from one of the threads inside a common comm survive.
      
      It means during this merge, we can lose a lot of callchains.
      
      Fix this by implementing callchains merge and apply it on histograms
      that collapse.
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      612d4fd7
    • F
      perf: Keep track of the max depth of a callchain · d2009c51
      Frederic Weisbecker 提交于
      In order to implement callchains collapsing, we need to keep
      track of the maximum depth in a histogram tree of callchains.
      This way we'll avoid allocating an arbitrary temporary buffer
      size on callchain merge time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      d2009c51
  23. 11 8月, 2010 1 次提交
    • A
      perf annotate: Sort by hottest lines in the TUI · 92221162
      Arnaldo Carvalho de Melo 提交于
      Right now it will just sort and position at the hottest line, i.e.
      the one where more samples were taken.
      
      It will be at the center of the screen and later TAB/shift-TAB will
      cycle thru the hottest lines.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92221162
  24. 06 8月, 2010 1 次提交
  25. 03 8月, 2010 1 次提交
    • A
      perf tools: Don't keep unreferenced maps when unmaps are detected · 0a1eae39
      Arnaldo Carvalho de Melo 提交于
      For a file with:
      
      [root@emilia linux-2.6-tip]# perf report -D -fi allmodconfig-j32.perf.data | grep events:
           TOTAL events:      36933
            MMAP events:       9056
            LOST events:          0
            COMM events:       1702
            EXIT events:       1887
        THROTTLE events:          8
      UNTHROTTLE events:          8
            FORK events:       1894
            READ events:          0
          SAMPLE events:      22378
            ATTR events:          0
      EVENT_TYPE events:          0
      TRACING_DATA events:          0
        BUILD_ID events:          0
      [root@emilia linux-2.6-tip]#
      
      Testing with valgrind and making perf_session__delete() a nop, so that
      we can notice how many maps were actually deleted due to not having any
      samples on it:
      
      ==== HEAP SUMMARY:
      
      Before:
      
      ==10339==     in use at exit: 8,909,997 bytes in 68,690 blocks
      ==10339==   total heap usage: 78,696 allocs, 10,007 frees, 11,925,853 bytes allocated
      
      After:
      
      ==10506==     in use at exit: 8,902,605 bytes in 68,606 blocks
      ==10506==   total heap usage: 78,696 allocs, 10,091 frees, 11,925,853 bytes allocated
      
      I.e. just 84 detected unmaps with no hits out of 9056 for this workload,
      not much, but in some other long running workload this may save more
      bytes.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0a1eae39
  26. 27 7月, 2010 2 次提交
    • A
      perf ui: New hists tree widget · 0f0cbf7a
      Arnaldo Carvalho de Melo 提交于
      The stock newt checkbox tree widget we were using was not really
      suitable for hist entry + callchain browsing.
      
      The problems with it were manifold:
      
      - We needed to traverse the whole hist_entry rb_tree to add each entry +
        callchains beforehand.
      
      - No control over the colors used for each row
      
      So a new tree widget, based mostly on slang, was written.
      
      It extends the ui_browser class already used for annotate to allow the
      user to fold/unfold branches in the callchains tree, using extra fields
      in the symbol_map class that is embedded in hist_entry and
      callchain_node instances to store the folding state and when changing
      this state calculates the number of rows that are produced when showing
      a particular hist_entry instance.
      
      This greatly speeds up browsing as we don't have to upfront touch all
      the entries and only calculate callchain related operations when some
      callchain branch is actually unfolded.
      
      The memory footprint is also reduced as the data structure is not
      duplicated, just some extra fields for controling callchain state and to
      simplify the process of seeking thru entries (nr_rows, row_offset) were
      added.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f0cbf7a
    • A
      perf hist: Introduce routine to measure lenght of formatted entry · 06daaaba
      Arnaldo Carvalho de Melo 提交于
      Will be used to figure out the window width needed in the new tree
      widget.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      06daaaba