1. 19 4月, 2010 1 次提交
  2. 15 4月, 2010 1 次提交
    • F
      perf tools: Fix accidentally preprocessed snprintf callback · fcd14984
      Frederic Weisbecker 提交于
      struct sort_entry has a callback named snprintf that turns an
      entry into a string result.
      But there are glibc versions that implement snprintf through a
      macro. The following expression is then going to get the snprintf
      call preprocessed:
      
              ent->snprintf(...)
      
      to finally end up in a build error:
      
              util/hist.c: Dans la fonction «hist_entry__snprintf» :
              util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk»
      
      To fix this, prepend struct sort_entry callbacks with an "se_"
      prefix.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fcd14984
  3. 04 4月, 2010 2 次提交
    • A
      perf TUI: Add a "Zoom into COMM(PID) thread" and reverse operations · a5e29aca
      Arnaldo Carvalho de Melo 提交于
      Now one can press the right arrow key and in addition to being able to
      filter by DSO, filter out by thread too, or a combination of both
      filters.
      
      With this one can start collecting events for the whole system, then
      focus on a subset of the collected data quickly.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a5e29aca
    • A
      perf newt: Add a "Zoom into foo.so DSO" and reverse operations · 83753190
      Arnaldo Carvalho de Melo 提交于
      Clicking on -> will bring as one of the popup menu options a "Zoom into
      CURRENT DSO", i.e. CURRENT will be replaced by the name of the DSO in
      the current line.
      
      Choosing this option will filter out all samples that didn't took place
      in a symbol in this DSO.
      
      After that the option reverts to "Zoom out of CURRENT DSO", to allow
      going back to the more compreensive view, not filtered by DSO.
      
      Future similar operations will include zooming into a particular thread,
      COMM, CPU, "last minute", "last N usecs", etc.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83753190
  4. 03 4月, 2010 2 次提交
    • A
      perf hist: Only allocate callchain_node if processing callchains · b9fb9304
      Arnaldo Carvalho de Melo 提交于
      The struct callchain_node size is 120 bytes, that are never used when
      there are no callchains or '-g none' is specified, so conditionally
      allocate it, reducing sizeof(struct hist_entry) from 210 bytes to only
      96, greatly speeding the non-callchain processing.
      
      LKML-Reference: <new-submission>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b9fb9304
    • A
      perf hist: Replace ->print() routines by ->snprintf() equivalents · a4e3b956
      Arnaldo Carvalho de Melo 提交于
      Then hist_entry__fprintf will just us the newly introduced
      hist_entry__snprintf, add the newline and fprintf it to the supplied
      FILE descriptor.
      
      This allows us to remove the use_browser checking in the color_printf
      routines, that now got color_snprintf variants too.
      
      The newt TUI browser (and other GUIs that may come in the future) don't
      have to worry about stdio specific stuff in the strings they get from
      the se->snprintf routines and instead use whatever means to do the
      equivalent.
      
      Also the newt TUI browser don't have to use the fmemopen() hack, instead
      it can use the se->snprintf routines directly. For now tho use the
      hist_entry__snprintf routine to reduce the patch size.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a4e3b956
  5. 26 3月, 2010 1 次提交
    • A
      perf tools: Introduce struct map_symbol · 59fd5306
      Arnaldo Carvalho de Melo 提交于
      That will be in both struct hist_entry and struct
      callchain_list, so that the TUI can store a pointer to the pair
      (map, symbol) in the trees where hist_entries and
      callchain_lists are present, to allow precise annotation instead
      of looking for the first symbol with the selected name.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1269459619-982-4-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      59fd5306
  6. 16 12月, 2009 1 次提交
    • A
      perf diff: Use perf_session__fprintf_hists just like 'perf record' · c351c281
      Arnaldo Carvalho de Melo 提交于
      That means that almost everything you can do with 'perf report'
      can be done with 'perf diff', for instance:
      
      $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2699
      samples) ] $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2687
      samples) ] perf diff | head -8
           9.02%     +1.00%     find  libc-2.10.1.so               [.] _IO_vfprintf_internal
           2.91%     -1.00%     find  [kernel]                     [k] __kmalloc
           2.85%     -1.00%     find  [kernel]                     [k] ext4_htree_store_dirent
           1.99%     -1.00%     find  [kernel]                     [k] _atomic_dec_and_lock
           2.44%                find  [kernel]                     [k] half_md4_transform
      $
      
      So if you want to zoom into libc:
      
      $ perf diff --dsos libc-2.10.1.so | head -8
          37.34%                find  [.] _IO_vfprintf_internal
          10.34%                find  [.] __GI_memmove
           8.25%     +2.00%     find  [.] _int_malloc
           5.07%     -1.00%     find  [.] __GI_mempcpy
           7.62%     +2.00%     find  [.] _int_free
      $
      
      And if there were multiple commands using libc, it is also
      possible to aggregate them all by using --sort symbol:
      
      $ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%             [.] _IO_vfprintf_internal
          10.34%             [.] __GI_memmove
           8.25%     +2.00%  [.] _int_malloc
           5.07%     -1.00%  [.] __GI_mempcpy
           7.62%     +2.00%  [.] _int_free
      $
      
      The displacement column now is off by default, to use it:
      
      perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%                   [.] _IO_vfprintf_internal
          10.34%                   [.] __GI_memmove
           8.25%     +2.00%        [.] _int_malloc
           5.07%     -1.00%    +2  [.] __GI_mempcpy
           7.62%     +2.00%    -1  [.] _int_free
      $
      
      Using -t/--field-separator can be used for scripting:
      
      $ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
      37.34, , ,[.] _IO_vfprintf_internal
      10.34, , ,[.] __GI_memmove
      8.25,+2.00%, ,[.] _int_malloc
      5.07,-1.00%,  +2,[.] __GI_mempcpy
      7.62,+2.00%,  -1,[.] _int_free
      6.99,+1.00%,  -1,[.] _IO_new_file_xsputn
      1.89,-2.00%,  +4,[.] __readdir64
      $
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c351c281
  7. 15 12月, 2009 2 次提交
    • A
      perf diff: Introduce tool to show performance difference · 86a9eee0
      Arnaldo Carvalho de Melo 提交于
      I guess it is enough to show some examples:
      
      [root@doppio linux-2.6-tip]# rm -f perf.data*
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      ls: cannot access perf.data*: No such file or directory
      [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ]
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      -rw------- 1 root root 74440 2009-12-14 20:03 perf.data
      [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ]
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      -rw------- 1 root root 74280 2009-12-14 20:03 perf.data
      -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old
      [root@doppio linux-2.6-tip]# perf diff | head -5
         1        -34994580     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        -15307806         [kernel.kallsyms]   __kmalloc
         3    +1   +3665941     /lib64/libc-2.10.1.so   __GI_memmove
         4    +4  +23508995     /lib64/libc-2.10.1.so   _int_malloc
         5    +7  +38538813         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -p | head -5
         1        +1.00%     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2                       [kernel.kallsyms]   __kmalloc
         3    +1             /lib64/libc-2.10.1.so   __GI_memmove
         4    +4             /lib64/libc-2.10.1.so   _int_malloc
         5    +7  -1.00%         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -v | head -5
         1        361449551 326454971 -34994580     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        151009241 135701435 -15307806         [kernel.kallsyms]   __kmalloc
         3    +1  101805328 105471269  +3665941     /lib64/libc-2.10.1.so   __GI_memmove
         4    +4   78041440 101550435 +23508995     /lib64/libc-2.10.1.so   _int_malloc
         5    +7   59536172  98074985 +38538813         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -vp | head -5
         1        9.00% 8.00% +1.00%     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        3.00% 3.00%                [kernel.kallsyms]   __kmalloc
         3    +1  2.00% 2.00%            /lib64/libc-2.10.1.so   __GI_memmove
         4    +4  2.00% 2.00%            /lib64/libc-2.10.1.so   _int_malloc
         5    +7  1.00% 2.00% -1.00%         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]#
      
      This should be enough for diffs where the system is non
      volatile, i.e. when one doesn't updates binaries.
      
      For volatile environments, stay tuned for the next perf tool
      feature: a buildid cache populated by 'perf record', managed by
      'perf buildid-cache' a-la ccache, and used by all the report
      tools.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <1260828571-3613-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86a9eee0
    • A
      perf util: Remove setup_sorting dups · c8829c7a
      Arnaldo Carvalho de Melo 提交于
      And it is also needed by 'perf diff'.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260828571-3613-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8829c7a
  8. 23 10月, 2009 2 次提交
    • F
      perf tools: Bind callchains to the first sort dimension column · a4fb581b
      Frederic Weisbecker 提交于
      Currently, the callchains are displayed using a constant left
      margin. So depending on the current sort dimension
      configuration, callchains may appear to be well attached to the
      first sort dimension column field which is mostly the case,
      except when the first dimension of sorting is done by comm,
      because these are right aligned.
      
      This patch binds the callchain to the first letter in the first
      column, whatever type of column it is (dso, comm, symbol).
      Before:
      
           0.80%             perf  [k] __lock_acquire
                   __lock_acquire
                   lock_acquire
                   |
                   |--58.33%-- _spin_lock
                   |          |
                   |          |--28.57%-- inotify_should_send_event
                   |          |          fsnotify
                   |          |          __fsnotify_parent
      
      After:
      
           0.80%             perf  [k] __lock_acquire
                             __lock_acquire
                             lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                             |          |          __fsnotify_parent
      
      Also, for clarity, we don't put anymore the callchain as is but:
      
      - If we have a top level ancestor in the callchain, start it
        with a first ascii hook.
      
        Before:
      
           0.80%             perf  [kernel]                        [k] __lock_acquire
                             __lock_acquire
                               lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                            [..]       [..]
      
         After:
      
           0.80%             perf  [kernel]                         [k] __lock_acquire
                             |
                             --- __lock_acquire
                                 lock_acquire
                                |
                                |--58.33%-- _spin_lock
                                |          |
                                |          |--28.57%-- inotify_should_send_event
                                |          |          fsnotify
                               [..]       [..]
      
      - Otherwise, if we have several top level ancestors, then
        display these like we did before:
      
             1.69%           Xorg
                             |
                             |--21.21%-- vread_hpet
                             |          0x7fffd85b46fc
                             |          0x7fffd85b494d
                             |          0x7f4fafb4e54d
                             |
                             |--15.15%-- exaOffscreenAlloc
                             |
                             |--9.09%-- I830WaitLpRing
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4fb581b
    • F
      perf tools: Fix missing top level callchain · af0a6fa4
      Frederic Weisbecker 提交于
      While recursively printing the branches of each callchains, we
      forget to display the root. It is never printed.
      
      Say we have:
      
          symbol
          f1
          f2
           |
           -------- f3
           |        f4
           |
           ---------f5
                    f6
      
      Actually we never see that, instead it displays:
      
          symbol
          |
          --------- f3
          |         f4
          |
          --------- f5
                    f6
      
      However f1 is always the same than "symbol" and if we are
      sorting by symbols first then "symbol", f1 and f2 will be well
      aligned like in the above example, so displaying f1 looks
      redundant here.
      
      But if we are sorting by something else first (dso, comm,
      etc...), displaying f1 doesn't look redundant but rather
      necessary because the symbol is not well aligned anymore with
      its callchain:
      
           comm     dso        symbol
           f1
           f2
           |
           --------- [...]
      
      And we want the callchain to be obvious.
      So we fix the bug by printing the root branch, but we also
      filter its first entry if we are sorting by symbols first.
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1256246604-17156-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af0a6fa4
  9. 02 10月, 2009 1 次提交
    • A
      perf tools: Rewrite and improve support for kernel modules · 439d473b
      Arnaldo Carvalho de Melo 提交于
      Representing modules as struct map entries, backed by a DSO, etc,
      using /proc/modules to find where the module is loaded.
      
      DSOs now can have a short and long name, so that in verbose mode we
      can show exactly which .ko or vmlinux image was used.
      
      As kernel modules now are a DSO separate from the kernel, we can
      ask for just the hits for a particular set of kernel modules, just
      like we can do with shared libraries:
      
      [root@doppio linux-2.6-tip]# perf report -n --vmlinux
      /home/acme/git/build/tip-recvmmsg/vmlinux --modules --dsos \[drm\] | head -15
          84.58%      13266             Xorg  [k] drm_clflush_pages
           4.02%        630             Xorg  [k] trace_kmalloc.clone.0
           3.95%        619             Xorg  [k] drm_ioctl
           2.07%        324             Xorg  [k] drm_addbufs
           1.68%        263             Xorg  [k] drm_gem_close_ioctl
           0.77%        120             Xorg  [k] drm_setmaster_ioctl
           0.70%        110             Xorg  [k] drm_lastclose
           0.68%        106             Xorg  [k] drm_open
           0.54%         85             Xorg  [k] drm_mm_search_free
      [root@doppio linux-2.6-tip]#
      
      Specifying --dsos /lib/modules/2.6.31-tip/kernel/drivers/gpu/drm/drm.ko
      would have the same effect. Allowing specifying just 'drm.ko' is left
      for another patch.
      
      Processing kallsyms so that per kernel module struct map are
      instantiated was also left for another patch. That will allow
      removing the module name from each of its symbols.
      
      struct symbol was reduced by removing the ->module backpointer and
      moving it (well now the map) to struct symbol_entry in perf top,
      that is its only user right now.
      
      The total linecount went down by ~500 lines.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      439d473b
  10. 25 9月, 2009 1 次提交