1. 11 5月, 2010 1 次提交
    • A
      perf hist: Introduce hists class and move lots of methods to it · 1c02c4d2
      Arnaldo Carvalho de Melo 提交于
      In cbbc79a5 we introduced support for multiple events by introducing a
      new "event_stat_id" struct and then made several perf_session methods
      receive a point to it instead of a pointer to perf_session, and kept the
      event_stats and hists rb_tree in perf_session.
      
      While working on the new newt based browser, I realised that it would be
      better to introduce a new class, "hists" (short for "histograms"),
      renaming the "event_stat_id" struct and the perf_session methods that
      were really "hists" methods, as they manipulate only struct hists
      members, not touching anything in the other perf_session members.
      
      Other optimizations, such as calculating the maximum lenght of a symbol
      name present in an hists instance will be possible as we add them,
      avoiding a re-traversal just for finding that information.
      
      The rationale for the name "hists" to replace "event_stat_id" is that we
      may have multiple sets of hists for the same event_stat id, as, for
      instance, the 'perf diff' tool has, so event stat id is not what
      characterizes what this struct and the functions that manipulate it do.
      
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1c02c4d2
  2. 10 5月, 2010 2 次提交
  3. 19 4月, 2010 1 次提交
  4. 15 4月, 2010 1 次提交
    • F
      perf tools: Fix accidentally preprocessed snprintf callback · fcd14984
      Frederic Weisbecker 提交于
      struct sort_entry has a callback named snprintf that turns an
      entry into a string result.
      But there are glibc versions that implement snprintf through a
      macro. The following expression is then going to get the snprintf
      call preprocessed:
      
              ent->snprintf(...)
      
      to finally end up in a build error:
      
              util/hist.c: Dans la fonction «hist_entry__snprintf» :
              util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk»
      
      To fix this, prepend struct sort_entry callbacks with an "se_"
      prefix.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fcd14984
  5. 03 4月, 2010 4 次提交
    • A
      perf hist: Only allocate callchain_node if processing callchains · b9fb9304
      Arnaldo Carvalho de Melo 提交于
      The struct callchain_node size is 120 bytes, that are never used when
      there are no callchains or '-g none' is specified, so conditionally
      allocate it, reducing sizeof(struct hist_entry) from 210 bytes to only
      96, greatly speeding the non-callchain processing.
      
      LKML-Reference: <new-submission>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b9fb9304
    • A
      perf hist: Replace ->print() routines by ->snprintf() equivalents · a4e3b956
      Arnaldo Carvalho de Melo 提交于
      Then hist_entry__fprintf will just us the newly introduced
      hist_entry__snprintf, add the newline and fprintf it to the supplied
      FILE descriptor.
      
      This allows us to remove the use_browser checking in the color_printf
      routines, that now got color_snprintf variants too.
      
      The newt TUI browser (and other GUIs that may come in the future) don't
      have to worry about stdio specific stuff in the strings they get from
      the se->snprintf routines and instead use whatever means to do the
      equivalent.
      
      Also the newt TUI browser don't have to use the fmemopen() hack, instead
      it can use the se->snprintf routines directly. For now tho use the
      hist_entry__snprintf routine to reduce the patch size.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a4e3b956
    • A
      perf report: Add progress bars · 5f4d3f88
      Arnaldo Carvalho de Melo 提交于
      For when we are processing the events and inserting the entries in the
      browser.
      
      Experimentation here: naming "ui_something" we may be treading into
      creating a TUI/GUI set of routines that can then be implemented in terms
      of multiple backends.
      
      Also the time it takes for adding things to the "browser" takes, visually
      (I guess I should do some profiling here ;-) ), more time than for
      processing the events...
      
      That means we probably need to create a custom hist_entry browser, so
      that we reuse the structures we have in place instead of duplicating
      them in newt.
      
      But progress was made and at least we can see something while long files
      are being loaded, that must be one of UI 101 bullet points :-)
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5f4d3f88
    • A
      perf symbols: Move more map_groups methods to map.c · c6e718ff
      Arnaldo Carvalho de Melo 提交于
      While writing a standalone test app that uses the symbol system to
      find kernel space symbols I noticed these also need to be moved.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c6e718ff
  6. 26 3月, 2010 2 次提交
  7. 23 3月, 2010 1 次提交
    • F
      perf: Fix orphan callchain branches · 301fde27
      Frederic Weisbecker 提交于
      Callchains have markers inside their capture to tell we
      enter a context (kernel, user, ...).
      
      Those are not displayed in the callchains but they are
      incidentally an active part of the radix tree where
      callchains are stored, just like any other address.
      
      If we have the two following callchains:
      
      addr1 -> addr2 -> user context -> addr3
      addr1 -> addr2 -> user context -> addr4
      addr1 -> addr2 -> addr 5
      
      This is pretty common if addr1 and addr2 are part of an
      interrupt path, addr3 and addr4 are user addresses and
      addr5 is a kernel non interrupt path.
      
      This will be stored as follows in the tree:
      
                         addr1
                         addr2
                         /   \
                        /     addr5
                  user context
                     /    \
                   addr3  addr4
      
      But we ignore the context markers in the report, hence
      the addr3 and addr4 will appear as orphan branches:
      
          |--28.30%-- hrtimer_interrupt
          |          smp_apic_timer_interrupt
          |          apic_timer_interrupt
          |          |           <------------- here, no parent!
          |          |          |
          |          |          |--11.11%-- 0x7fae7bccb875
          |          |          |
          |          |          |--11.11%-- 0xffffffffff60013b
          |          |          |
          |          |          |--11.11%-- __pthread_mutex_lock_internal
          |          |          |
          |          |          |--11.11%-- __errno_location
      
      Fix this by removing the context markers when we process the
      callchains to the tree.
      Reported-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1269274173-20328-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      301fde27
  8. 13 3月, 2010 1 次提交
    • A
      perf hist: Don't fprintf the callgraph unconditionally · 3997d377
      Arnaldo Carvalho de Melo 提交于
      [root@doppio ~]# perf report -i newt.data | head -10
        # Samples: 11999679868
        #
        # Overhead  Command                  Shared Object  Symbol
        # ........  .......  .............................  ......
        #
            63.61%     perf  libslang.so.2.1.4              [.] SLsmg_write_chars
             6.30%     perf  perf                           [.] symbols__find
             2.19%     perf  libnewt.so.0.52.10             [.] newtListboxAppendEntry
             2.08%     perf  libslang.so.2.1.4              [.] SLsmg_write_chars@plt
             1.99%     perf  libc-2.10.2.so                 [.] _IO_vfprintf_internal
        [root@doppio ~]#
      
      Not good, the newt form for report works, but slang has to eat
      the cost of the additional callgraph lines everytime it prints a
      line, and the callgraph doesn't appear on the screen, so move
      the callgraph printing to a separate function and don't use it
      in newt.c.
      
      Newt tree widgets are being investigated to properly support
      callgraphs, but till that gets merged, lets remove this huge
      overhead and show at least the symbol overheads for a callgraph
      rich perf.data with good performance.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268408808-13595-2-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3997d377
  9. 12 3月, 2010 1 次提交
  10. 10 3月, 2010 3 次提交
  11. 09 2月, 2010 1 次提交
  12. 17 12月, 2009 1 次提交
    • A
      perf diff: Percent calcs should use double values · 9b33827d
      Arnaldo Carvalho de Melo 提交于
      Otherwise we do integer math and the delta values round up to
      multiples of 1.0%.
      
      Also, calculate absolute values. Things look precise now:
      
      $ perf report -i perf.data.old --sort dso,symbol | head -13
           9.02%  libc-2.10.1.so               [.] _IO_vfprintf_internal
           4.88%  find                         [.] 0x00000000014af0
           2.91%  [kernel]                     [k] __kmalloc
           2.85%  [kernel]                     [k] ext4_htree_store_dirent
           2.50%  libc-2.10.1.so               [.] __GI_memmove
           2.44%  [kernel]                     [k] half_md4_transform
           2.43%  [kernel]                     [k] _spin_lock
           2.33%  [kernel]                     [k] system_call
      $ perf report -i perf.data --sort dso,symbol | head -13
           8.55%  libc-2.10.1.so               [.] _IO_vfprintf_internal
           3.11%  [kernel]                     [k] __kmalloc
           3.07%  [kernel]                     [k] ext4_htree_store_dirent
           2.66%  find                         [.] 0x00000000016bcf
           2.61%  [kernel]                     [k] _atomic_dec_and_lock
           2.46%  [kernel]                     [k] half_md4_transform
           2.41%  libc-2.10.1.so               [.] __GI_memmove
           2.30%  find                         [.] 0x00000000009219
      $ perf diff | head -13
           9.02%     -0.47%  libc-2.10.1.so               [.] _IO_vfprintf_internal
           2.91%     +0.20%  [kernel]                     [k] __kmalloc
           2.85%     +0.23%  [kernel]                     [k] ext4_htree_store_dirent
           1.99%     +0.62%  [kernel]                     [k] _atomic_dec_and_lock
           2.44%     +0.02%  [kernel]                     [k] half_md4_transform
           2.50%     -0.09%  libc-2.10.1.so               [.] __GI_memmove
           1.88%     +0.01%  [kernel]                     [k] __d_lookup
           2.43%     -0.75%  [kernel]                     [k] _spin_lock
           0.97%     +0.62%  [kernel]                     [k] path_get
           1.99%     -0.42%  libc-2.10.1.so               [.] _int_malloc
      $
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260981109-2621-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9b33827d
  13. 16 12月, 2009 4 次提交
    • A
      perf diff: Use perf_session__fprintf_hists just like 'perf record' · c351c281
      Arnaldo Carvalho de Melo 提交于
      That means that almost everything you can do with 'perf report'
      can be done with 'perf diff', for instance:
      
      $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2699
      samples) ] $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2687
      samples) ] perf diff | head -8
           9.02%     +1.00%     find  libc-2.10.1.so               [.] _IO_vfprintf_internal
           2.91%     -1.00%     find  [kernel]                     [k] __kmalloc
           2.85%     -1.00%     find  [kernel]                     [k] ext4_htree_store_dirent
           1.99%     -1.00%     find  [kernel]                     [k] _atomic_dec_and_lock
           2.44%                find  [kernel]                     [k] half_md4_transform
      $
      
      So if you want to zoom into libc:
      
      $ perf diff --dsos libc-2.10.1.so | head -8
          37.34%                find  [.] _IO_vfprintf_internal
          10.34%                find  [.] __GI_memmove
           8.25%     +2.00%     find  [.] _int_malloc
           5.07%     -1.00%     find  [.] __GI_mempcpy
           7.62%     +2.00%     find  [.] _int_free
      $
      
      And if there were multiple commands using libc, it is also
      possible to aggregate them all by using --sort symbol:
      
      $ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%             [.] _IO_vfprintf_internal
          10.34%             [.] __GI_memmove
           8.25%     +2.00%  [.] _int_malloc
           5.07%     -1.00%  [.] __GI_mempcpy
           7.62%     +2.00%  [.] _int_free
      $
      
      The displacement column now is off by default, to use it:
      
      perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%                   [.] _IO_vfprintf_internal
          10.34%                   [.] __GI_memmove
           8.25%     +2.00%        [.] _int_malloc
           5.07%     -1.00%    +2  [.] __GI_mempcpy
           7.62%     +2.00%    -1  [.] _int_free
      $
      
      Using -t/--field-separator can be used for scripting:
      
      $ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
      37.34, , ,[.] _IO_vfprintf_internal
      10.34, , ,[.] __GI_memmove
      8.25,+2.00%, ,[.] _int_malloc
      5.07,-1.00%,  +2,[.] __GI_mempcpy
      7.62,+2.00%,  -1,[.] _int_free
      6.99,+1.00%,  -1,[.] _IO_new_file_xsputn
      1.89,-2.00%,  +4,[.] __readdir64
      $
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c351c281
    • A
      perf session: Move perf report specific hits out of perf_session__fprintf_hists · 3e6055ab
      Arnaldo Carvalho de Melo 提交于
      Those don't make sense for tools such as 'perf diff'.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260973631-28035-2-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e6055ab
    • A
      perf tools: Move hist entries printing routines from perf report · 4ecf84d0
      Arnaldo Carvalho de Melo 提交于
      Will be used in other tools such as 'perf diff'.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260973631-28035-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ecf84d0
    • A
      perf report: Generalize perf_session__fprintf_hists() · d599db3f
      Arnaldo Carvalho de Melo 提交于
      Pull it out of builtin-report - further changes will be made and it
      will then be reusable in 'perf diff' as well.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260914682-29652-4-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d599db3f
  14. 14 12月, 2009 2 次提交
  15. 28 11月, 2009 2 次提交
    • A
      perf tools: Consolidate symbol resolving across all tools · 1ed091c4
      Arnaldo Carvalho de Melo 提交于
      Now we have a very high level routine for simple tools to
      process IP sample events:
      
      	int event__preprocess_sample(const event_t *self,
      				     struct addr_location *al,
      				     symbol_filter_t filter)
      
      It receives the event itself and will insert new threads in the
      global threads list and resolve the map and symbol, filling all
      this info into the new addr_location struct, so that tools like
      annotate and report can further process the event by creating
      hist_entries in their specific way (with or without callgraphs,
      etc).
      
      It in turn uses the new next layer function:
      
      	void thread__find_addr_location(struct thread *self, u8 cpumode,
      					enum map_type type, u64 addr,
      					struct addr_location *al,
      					symbol_filter_t filter)
      
      This one will, given a thread (userspace or the kernel kthread
      one), will find the given type (MAP__FUNCTION now, MAP__VARIABLE
      too in the near future) at the given cpumode, taking vdsos into
      account (userspace hit, but kernel symbol) and will fill all
      these details in the addr_location given.
      
      Tools that need a more compact API for plain function
      resolution, like 'kmem', can use this other one:
      
      	struct symbol *thread__find_function(struct thread *self, u64 addr,
      					     symbol_filter_t filter)
      
      So, to resolve a kernel symbol, that is all the 'kmem' tool
      needs, its just a matter of calling:
      
      	sym = thread__find_function(kthread, addr, NULL);
      
      The 'filter' parameter is needed because we do lazy
      parsing/loading of ELF symtabs or /proc/kallsyms.
      
      With this we remove more code duplication all around, which is
      always good, huh? :-)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-12-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ed091c4
    • A
      perf tools: Reorganize event processing routines, lotsa dups killed · 62daacb5
      Arnaldo Carvalho de Melo 提交于
      While implementing event__preprocess_sample, that will do all of
      the symbol lookup in one convenient function, I noticed that
      util/process_event.[ch] were not being used at all, then started
      looking if there were other functions that could be shared
      and...
      
      All those functions really don't need to receive offset + head,
      the only thing they did was common to all of them, so do it at
      one place instead.
      
      Stats about number of each type of event processed now is done
      in a central place.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-11-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62daacb5
  16. 03 10月, 2009 1 次提交
  17. 30 9月, 2009 1 次提交