1. 16 12月, 2009 1 次提交
    • A
      perf diff: Use perf_session__fprintf_hists just like 'perf record' · c351c281
      Arnaldo Carvalho de Melo 提交于
      That means that almost everything you can do with 'perf report'
      can be done with 'perf diff', for instance:
      
      $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2699
      samples) ] $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2687
      samples) ] perf diff | head -8
           9.02%     +1.00%     find  libc-2.10.1.so               [.] _IO_vfprintf_internal
           2.91%     -1.00%     find  [kernel]                     [k] __kmalloc
           2.85%     -1.00%     find  [kernel]                     [k] ext4_htree_store_dirent
           1.99%     -1.00%     find  [kernel]                     [k] _atomic_dec_and_lock
           2.44%                find  [kernel]                     [k] half_md4_transform
      $
      
      So if you want to zoom into libc:
      
      $ perf diff --dsos libc-2.10.1.so | head -8
          37.34%                find  [.] _IO_vfprintf_internal
          10.34%                find  [.] __GI_memmove
           8.25%     +2.00%     find  [.] _int_malloc
           5.07%     -1.00%     find  [.] __GI_mempcpy
           7.62%     +2.00%     find  [.] _int_free
      $
      
      And if there were multiple commands using libc, it is also
      possible to aggregate them all by using --sort symbol:
      
      $ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%             [.] _IO_vfprintf_internal
          10.34%             [.] __GI_memmove
           8.25%     +2.00%  [.] _int_malloc
           5.07%     -1.00%  [.] __GI_mempcpy
           7.62%     +2.00%  [.] _int_free
      $
      
      The displacement column now is off by default, to use it:
      
      perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%                   [.] _IO_vfprintf_internal
          10.34%                   [.] __GI_memmove
           8.25%     +2.00%        [.] _int_malloc
           5.07%     -1.00%    +2  [.] __GI_mempcpy
           7.62%     +2.00%    -1  [.] _int_free
      $
      
      Using -t/--field-separator can be used for scripting:
      
      $ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
      37.34, , ,[.] _IO_vfprintf_internal
      10.34, , ,[.] __GI_memmove
      8.25,+2.00%, ,[.] _int_malloc
      5.07,-1.00%,  +2,[.] __GI_mempcpy
      7.62,+2.00%,  -1,[.] _int_free
      6.99,+1.00%,  -1,[.] _IO_new_file_xsputn
      1.89,-2.00%,  +4,[.] __readdir64
      $
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c351c281
  2. 15 12月, 2009 2 次提交
    • A
      perf diff: Introduce tool to show performance difference · 86a9eee0
      Arnaldo Carvalho de Melo 提交于
      I guess it is enough to show some examples:
      
      [root@doppio linux-2.6-tip]# rm -f perf.data*
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      ls: cannot access perf.data*: No such file or directory
      [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ]
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      -rw------- 1 root root 74440 2009-12-14 20:03 perf.data
      [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ]
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      -rw------- 1 root root 74280 2009-12-14 20:03 perf.data
      -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old
      [root@doppio linux-2.6-tip]# perf diff | head -5
         1        -34994580     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        -15307806         [kernel.kallsyms]   __kmalloc
         3    +1   +3665941     /lib64/libc-2.10.1.so   __GI_memmove
         4    +4  +23508995     /lib64/libc-2.10.1.so   _int_malloc
         5    +7  +38538813         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -p | head -5
         1        +1.00%     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2                       [kernel.kallsyms]   __kmalloc
         3    +1             /lib64/libc-2.10.1.so   __GI_memmove
         4    +4             /lib64/libc-2.10.1.so   _int_malloc
         5    +7  -1.00%         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -v | head -5
         1        361449551 326454971 -34994580     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        151009241 135701435 -15307806         [kernel.kallsyms]   __kmalloc
         3    +1  101805328 105471269  +3665941     /lib64/libc-2.10.1.so   __GI_memmove
         4    +4   78041440 101550435 +23508995     /lib64/libc-2.10.1.so   _int_malloc
         5    +7   59536172  98074985 +38538813         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -vp | head -5
         1        9.00% 8.00% +1.00%     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        3.00% 3.00%                [kernel.kallsyms]   __kmalloc
         3    +1  2.00% 2.00%            /lib64/libc-2.10.1.so   __GI_memmove
         4    +4  2.00% 2.00%            /lib64/libc-2.10.1.so   _int_malloc
         5    +7  1.00% 2.00% -1.00%         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]#
      
      This should be enough for diffs where the system is non
      volatile, i.e. when one doesn't updates binaries.
      
      For volatile environments, stay tuned for the next perf tool
      feature: a buildid cache populated by 'perf record', managed by
      'perf buildid-cache' a-la ccache, and used by all the report
      tools.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <1260828571-3613-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86a9eee0
    • A
      perf util: Remove setup_sorting dups · c8829c7a
      Arnaldo Carvalho de Melo 提交于
      And it is also needed by 'perf diff'.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260828571-3613-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8829c7a
  3. 23 10月, 2009 2 次提交
    • F
      perf tools: Bind callchains to the first sort dimension column · a4fb581b
      Frederic Weisbecker 提交于
      Currently, the callchains are displayed using a constant left
      margin. So depending on the current sort dimension
      configuration, callchains may appear to be well attached to the
      first sort dimension column field which is mostly the case,
      except when the first dimension of sorting is done by comm,
      because these are right aligned.
      
      This patch binds the callchain to the first letter in the first
      column, whatever type of column it is (dso, comm, symbol).
      Before:
      
           0.80%             perf  [k] __lock_acquire
                   __lock_acquire
                   lock_acquire
                   |
                   |--58.33%-- _spin_lock
                   |          |
                   |          |--28.57%-- inotify_should_send_event
                   |          |          fsnotify
                   |          |          __fsnotify_parent
      
      After:
      
           0.80%             perf  [k] __lock_acquire
                             __lock_acquire
                             lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                             |          |          __fsnotify_parent
      
      Also, for clarity, we don't put anymore the callchain as is but:
      
      - If we have a top level ancestor in the callchain, start it
        with a first ascii hook.
      
        Before:
      
           0.80%             perf  [kernel]                        [k] __lock_acquire
                             __lock_acquire
                               lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                            [..]       [..]
      
         After:
      
           0.80%             perf  [kernel]                         [k] __lock_acquire
                             |
                             --- __lock_acquire
                                 lock_acquire
                                |
                                |--58.33%-- _spin_lock
                                |          |
                                |          |--28.57%-- inotify_should_send_event
                                |          |          fsnotify
                               [..]       [..]
      
      - Otherwise, if we have several top level ancestors, then
        display these like we did before:
      
             1.69%           Xorg
                             |
                             |--21.21%-- vread_hpet
                             |          0x7fffd85b46fc
                             |          0x7fffd85b494d
                             |          0x7f4fafb4e54d
                             |
                             |--15.15%-- exaOffscreenAlloc
                             |
                             |--9.09%-- I830WaitLpRing
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4fb581b
    • F
      perf tools: Fix missing top level callchain · af0a6fa4
      Frederic Weisbecker 提交于
      While recursively printing the branches of each callchains, we
      forget to display the root. It is never printed.
      
      Say we have:
      
          symbol
          f1
          f2
           |
           -------- f3
           |        f4
           |
           ---------f5
                    f6
      
      Actually we never see that, instead it displays:
      
          symbol
          |
          --------- f3
          |         f4
          |
          --------- f5
                    f6
      
      However f1 is always the same than "symbol" and if we are
      sorting by symbols first then "symbol", f1 and f2 will be well
      aligned like in the above example, so displaying f1 looks
      redundant here.
      
      But if we are sorting by something else first (dso, comm,
      etc...), displaying f1 doesn't look redundant but rather
      necessary because the symbol is not well aligned anymore with
      its callchain:
      
           comm     dso        symbol
           f1
           f2
           |
           --------- [...]
      
      And we want the callchain to be obvious.
      So we fix the bug by printing the root branch, but we also
      filter its first entry if we are sorting by symbols first.
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1256246604-17156-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af0a6fa4
  4. 02 10月, 2009 1 次提交
    • A
      perf tools: Rewrite and improve support for kernel modules · 439d473b
      Arnaldo Carvalho de Melo 提交于
      Representing modules as struct map entries, backed by a DSO, etc,
      using /proc/modules to find where the module is loaded.
      
      DSOs now can have a short and long name, so that in verbose mode we
      can show exactly which .ko or vmlinux image was used.
      
      As kernel modules now are a DSO separate from the kernel, we can
      ask for just the hits for a particular set of kernel modules, just
      like we can do with shared libraries:
      
      [root@doppio linux-2.6-tip]# perf report -n --vmlinux
      /home/acme/git/build/tip-recvmmsg/vmlinux --modules --dsos \[drm\] | head -15
          84.58%      13266             Xorg  [k] drm_clflush_pages
           4.02%        630             Xorg  [k] trace_kmalloc.clone.0
           3.95%        619             Xorg  [k] drm_ioctl
           2.07%        324             Xorg  [k] drm_addbufs
           1.68%        263             Xorg  [k] drm_gem_close_ioctl
           0.77%        120             Xorg  [k] drm_setmaster_ioctl
           0.70%        110             Xorg  [k] drm_lastclose
           0.68%        106             Xorg  [k] drm_open
           0.54%         85             Xorg  [k] drm_mm_search_free
      [root@doppio linux-2.6-tip]#
      
      Specifying --dsos /lib/modules/2.6.31-tip/kernel/drivers/gpu/drm/drm.ko
      would have the same effect. Allowing specifying just 'drm.ko' is left
      for another patch.
      
      Processing kallsyms so that per kernel module struct map are
      instantiated was also left for another patch. That will allow
      removing the module name from each of its symbols.
      
      struct symbol was reduced by removing the ->module backpointer and
      moving it (well now the map) to struct symbol_entry in perf top,
      that is its only user right now.
      
      The total linecount went down by ~500 lines.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      439d473b
  5. 25 9月, 2009 1 次提交