1. 28 5月, 2013 1 次提交
  2. 01 4月, 2013 2 次提交
    • S
      perf tools: Add mem access sampling core support · 98a3b32c
      Stephane Eranian 提交于
      This patch adds the sorting and histogram support
      functions to enable profiling of memory accesses.
      
      The following sorting orders are added:
       - symbol_daddr: data address symbol (or raw address)
       - dso_daddr: data address shared object
       - locked: access uses locked transaction
       - tlb : TLB access
       - mem : memory level of the access (L1, L2, L3, RAM, ...)
       - snoop: access snoop mode
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1359040242-8269-12-git-send-email-eranian@google.com
      [ committer note: changed to cope with fc5871ed, the move of methods to
        machine.[ch], and the rename of dsrc to data_src, to match the change
        made in the PERF_SAMPLE_DSRC in a previous patch. ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      98a3b32c
    • A
      perf tools: Add support for weight v7 (modified) · 05484298
      Andi Kleen 提交于
      perf record has a new option -W that enables weightened sampling.
      
      Add sorting support in top/report for the average weight per sample and the
      total weight sum. This allows to both compare relative cost per event
      and the total cost over the measurement period.
      
      Add the necessary glue to perf report, record and the library.
      
      v2: Merge with new hist refactoring.
      v3: Fix manpage. Remove value check.
      Rename global_weight to weight and weight to local_weight.
      v4: Readd sort keys to manpage
      v5: Move weight to end
      v6: Move weight to template
      v7: Rename weight key.
      
      Original patch from Andi modified by Stephane Eranian <eranian@google.com>
      to include ONLY the weight supporting code and apply to pristine 3.8.0-rc4.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1359040242-8269-6-git-send-email-eranian@google.com
      [ committer note: changed to cope with fc5871ed and the hists_link perf test entry ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05484298
  3. 07 2月, 2013 1 次提交
  4. 25 1月, 2013 1 次提交
  5. 09 12月, 2012 2 次提交
  6. 09 11月, 2012 1 次提交
  7. 06 10月, 2012 2 次提交
    • J
      perf diff: Add weighted diff computation way to compare hist entries · 81d5f958
      Jiri Olsa 提交于
      Adding 'wdiff' as new computation way to compare hist entries.
      
      If specified the 'Weighted diff' column is displayed with value 'd'
      computed as:
      
         d = B->period * WEIGHT-A - A->period * WEIGHT-B
      
        - A/B being matching hist entry from first/second file specified
          (or perf.data/perf.data.old) respectively.
        - period being the hist entry period value
        - WEIGHT-A/WEIGHT-B being user suplied weights in the the '-c' option
          behind ':' separator like '-c wdiff:1,2'.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349448287-18919-5-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      81d5f958
    • J
      perf diff: Add option to sort entries based on diff computation · 96c47f19
      Jiri Olsa 提交于
      Adding support to sort hist entries based on the outcome of selected
      computation. It's now possible to specify '+' as a first character of
      '-c' option value to make such sort.
      
      Example:
      
        $ perf diff -c ratio -b
        # Event 'cache-misses'
        #
        #   Baseline           Ratio      Shared Object                            Symbol
        #   ........  ..............  .................  ................................
        #
              19.64%            0.69  [kernel.kallsyms]  [k] clear_page
               0.30%            0.17  [kernel.kallsyms]  [k] mm_alloc
               0.04%            0.20  [kernel.kallsyms]  [k] kmem_cache_alloc
      
        $ perf diff -c +ratio -b
        # Event 'cache-misses'
        #
        #   Baseline           Ratio      Shared Object                            Symbol
        #   ........  ..............  .................  ................................
        #
              19.64%            0.69  [kernel.kallsyms]  [k] clear_page
               0.04%            0.20  [kernel.kallsyms]  [k] kmem_cache_alloc
               0.30%            0.17  [kernel.kallsyms]  [k] mm_alloc
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349448287-18919-4-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      96c47f19
  8. 05 10月, 2012 3 次提交
  9. 18 9月, 2012 1 次提交
  10. 08 9月, 2012 1 次提交
  11. 20 6月, 2012 1 次提交
    • A
      perf tools: Add sort by src line/number · 409a8be6
      Arnaldo Carvalho de Melo 提交于
      Using addr2line for now, requires debuginfo, needs more work to support
      detached debuginfo, aka foo-debuginfo packages.
      
      Example:
      
      	[root@sandy ~]# perf record -a sleep 3
      	[ perf record: Woken up 1 times to write data ]
      	[ perf record: Captured and wrote 0.555 MB perf.data (~24236 samples) ]
      	[root@sandy ~]# perf report -s dso,srcline 2>&1 | grep -v ^# | head -5
      	    22.41%  [kernel.kallsyms]  /home/git/linux/drivers/idle/intel_idle.c:280
      	     4.79%  [kernel.kallsyms]  /home/git/linux/drivers/cpuidle/cpuidle.c:148
      	     4.78%  [kernel.kallsyms]  /home/git/linux/arch/x86/include/asm/atomic64_64.h:121
      	     4.49%  [kernel.kallsyms]  /home/git/linux/kernel/sched/core.c:1690
      	     4.30%  [kernel.kallsyms]  /home/git/linux/include/linux/seqlock.h:90
      	[root@sandy ~]#
      
      [root@sandy ~]# perf top -U -s dso,symbol,srcline
      Samples: 1K of event 'cycles', Event count (approx.): 589617389
       18.66%  [kernel]  [k] copy_user_generic_unrolled   /home/git/linux/arch/x86/lib/copy_user_64.S:143
        7.83%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:39
        6.59%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:38
        3.66%  [kernel]  [k] page_fault                   /home/git/linux/arch/x86/kernel/entry_64.S:1379
        3.25%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:40
        3.12%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:37
        2.74%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:36
        2.39%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:43
        2.12%  [kernel]  [k] ioread32                     /home/git/linux/lib/iomap.c:90
        1.51%  [kernel]  [k] copy_user_generic_unrolled   /home/git/linux/arch/x86/lib/copy_user_64.S:144
        1.19%  [kernel]  [k] copy_user_generic_unrolled   /home/git/linux/arch/x86/lib/copy_user_64.S:154
      Suggested-by: NAndi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-pdmqbng9twz06jzkbgtuwbp8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      409a8be6
  12. 09 3月, 2012 3 次提交
  13. 13 10月, 2011 1 次提交
  14. 07 10月, 2011 1 次提交
    • A
      perf hists: Threaded addition and sorting of entries · 1980c2eb
      Arnaldo Carvalho de Melo 提交于
      By using a mutex just for inserting and rotating two hist_entry rb
      trees, so that when sorting we can get the last batch of entries created
      from the ring buffer, merge it with whatever we have processed so far
      and show the output while new entries are being added.
      
      The 'report' tool continues, for now, to do it without threading, but
      will use this in the future to allow visualization of results in long
      perf.data sessions while the entries are being processed.
      
      The new 'top' tool will be the first user.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-9b05atsn0q6m7fqgrug8fk2i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1980c2eb
  15. 30 6月, 2011 2 次提交
  16. 23 8月, 2010 1 次提交
    • F
      perf: Keep track of the max depth of a callchain · d2009c51
      Frederic Weisbecker 提交于
      In order to implement callchains collapsing, we need to keep
      track of the maximum depth in a histogram tree of callchains.
      This way we'll avoid allocating an arbitrary temporary buffer
      size on callchain merge time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      d2009c51
  17. 27 7月, 2010 1 次提交
    • A
      perf ui: New hists tree widget · 0f0cbf7a
      Arnaldo Carvalho de Melo 提交于
      The stock newt checkbox tree widget we were using was not really
      suitable for hist entry + callchain browsing.
      
      The problems with it were manifold:
      
      - We needed to traverse the whole hist_entry rb_tree to add each entry +
        callchains beforehand.
      
      - No control over the colors used for each row
      
      So a new tree widget, based mostly on slang, was written.
      
      It extends the ui_browser class already used for annotate to allow the
      user to fold/unfold branches in the callchains tree, using extra fields
      in the symbol_map class that is embedded in hist_entry and
      callchain_node instances to store the folding state and when changing
      this state calculates the number of rows that are produced when showing
      a particular hist_entry instance.
      
      This greatly speeds up browsing as we don't have to upfront touch all
      the entries and only calculate callchain related operations when some
      callchain branch is actually unfolded.
      
      The memory footprint is also reduced as the data structure is not
      duplicated, just some extra fields for controling callchain state and to
      simplify the process of seeking thru entries (nr_rows, row_offset) were
      added.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f0cbf7a
  18. 23 7月, 2010 1 次提交
  19. 05 6月, 2010 1 次提交
    • A
      perf report: Implement --sort cpu · f60f3593
      Arun Sharma 提交于
      In a shared multi-core environment, users want to analyze why their
      program was slow. In particular, if the code ran slower only on certain
      CPUs due to interference from other programs or kernel threads, the user
      should be able to notice that.
      
      Sample usage:
      
      perf record -f -a -- sleep 3
      perf report --sort cpu,comm
      
      Workload:
      
      program is running on 16 CPUs
      Experiencing interference from an antagonist only on 4 CPUs.
      
        Samples: 106218177676 cycles
      
        Overhead  CPU          Command
        ........  ...  ...............
      
           6.25%  2            program
           6.24%  6            program
           6.24%  11           program
           6.24%  5            program
           6.24%  9            program
           6.24%  10           program
           6.23%  15           program
           6.23%  7            program
           6.23%  3            program
           6.23%  14           program
           6.22%  1            program
           6.20%  13           program
           3.17%  12           program
           3.15%  8            program
           3.14%  0            program
           3.13%  4            program
           3.11%  4         antagonist
           3.11%  0         antagonist
           3.10%  8         antagonist
           3.07%  12        antagonist
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <20100505181612.GA5091@sharma-home.net>
      Signed-off-by: NArun Sharma <aruns@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f60f3593
  20. 18 5月, 2010 1 次提交
    • A
      perf options: Type check all the remaining OPT_ variants · edb7c60e
      Arnaldo Carvalho de Melo 提交于
      OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these
      tools is to set something that has an enum type, that is builtin
      compatible with unsigned int.
      
      Several string constifications were done to make OPT_STRING require a
      const char * type.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      edb7c60e
  21. 15 5月, 2010 1 次提交
    • A
      perf report: Report number of events, not samples · c82ee828
      Arnaldo Carvalho de Melo 提交于
      Number of samples is meaningless after we switched to auto-freq, so
      report the number of events, i.e. not the sum of the different periods,
      but the number PERF_RECORD_SAMPLE emitted by the kernel.
      
      While doing this I noticed that naming "count" to the sum of all the
      event periods can be confusing, so rename it to .period, just like in
      struct sample.data, so that we become more consistent.
      
      This helps with the next step, that was to record in struct hist_entry
      the number of sample events for each instance, we need that because we
      use it to generate the number of events when applying filters to the
      tree of hist entries like it is being done in the TUI report browser.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c82ee828
  22. 12 5月, 2010 1 次提交
    • A
      perf report: Librarize the annotation code and use it in the newt browser · ef7b93a1
      Arnaldo Carvalho de Melo 提交于
      Now we don't anymore use popen to run 'perf annotate' for the selected
      symbol, instead we collect per address samplings when processing samples
      in 'perf report' if we're using the newt browser, then we use this data
      directly to do annotation.
      
      Done this way we can actually traverse the objdump_line objects
      directly, matching the addresses to the collected samples and colouring
      them appropriately using lower level slang routines.
      
      The new ui_browser class will be reused for the main, callchain aware,
      histogram browser, when it will be made generic and don't assume that
      the objects are always instances of the objdump_line class maintained
      using list_heads.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef7b93a1
  23. 19 4月, 2010 1 次提交
  24. 15 4月, 2010 1 次提交
    • F
      perf tools: Fix accidentally preprocessed snprintf callback · fcd14984
      Frederic Weisbecker 提交于
      struct sort_entry has a callback named snprintf that turns an
      entry into a string result.
      But there are glibc versions that implement snprintf through a
      macro. The following expression is then going to get the snprintf
      call preprocessed:
      
              ent->snprintf(...)
      
      to finally end up in a build error:
      
              util/hist.c: Dans la fonction «hist_entry__snprintf» :
              util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk»
      
      To fix this, prepend struct sort_entry callbacks with an "se_"
      prefix.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fcd14984
  25. 04 4月, 2010 2 次提交
    • A
      perf TUI: Add a "Zoom into COMM(PID) thread" and reverse operations · a5e29aca
      Arnaldo Carvalho de Melo 提交于
      Now one can press the right arrow key and in addition to being able to
      filter by DSO, filter out by thread too, or a combination of both
      filters.
      
      With this one can start collecting events for the whole system, then
      focus on a subset of the collected data quickly.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a5e29aca
    • A
      perf newt: Add a "Zoom into foo.so DSO" and reverse operations · 83753190
      Arnaldo Carvalho de Melo 提交于
      Clicking on -> will bring as one of the popup menu options a "Zoom into
      CURRENT DSO", i.e. CURRENT will be replaced by the name of the DSO in
      the current line.
      
      Choosing this option will filter out all samples that didn't took place
      in a symbol in this DSO.
      
      After that the option reverts to "Zoom out of CURRENT DSO", to allow
      going back to the more compreensive view, not filtered by DSO.
      
      Future similar operations will include zooming into a particular thread,
      COMM, CPU, "last minute", "last N usecs", etc.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83753190
  26. 03 4月, 2010 2 次提交
    • A
      perf hist: Only allocate callchain_node if processing callchains · b9fb9304
      Arnaldo Carvalho de Melo 提交于
      The struct callchain_node size is 120 bytes, that are never used when
      there are no callchains or '-g none' is specified, so conditionally
      allocate it, reducing sizeof(struct hist_entry) from 210 bytes to only
      96, greatly speeding the non-callchain processing.
      
      LKML-Reference: <new-submission>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b9fb9304
    • A
      perf hist: Replace ->print() routines by ->snprintf() equivalents · a4e3b956
      Arnaldo Carvalho de Melo 提交于
      Then hist_entry__fprintf will just us the newly introduced
      hist_entry__snprintf, add the newline and fprintf it to the supplied
      FILE descriptor.
      
      This allows us to remove the use_browser checking in the color_printf
      routines, that now got color_snprintf variants too.
      
      The newt TUI browser (and other GUIs that may come in the future) don't
      have to worry about stdio specific stuff in the strings they get from
      the se->snprintf routines and instead use whatever means to do the
      equivalent.
      
      Also the newt TUI browser don't have to use the fmemopen() hack, instead
      it can use the se->snprintf routines directly. For now tho use the
      hist_entry__snprintf routine to reduce the patch size.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a4e3b956
  27. 26 3月, 2010 1 次提交
    • A
      perf tools: Introduce struct map_symbol · 59fd5306
      Arnaldo Carvalho de Melo 提交于
      That will be in both struct hist_entry and struct
      callchain_list, so that the TUI can store a pointer to the pair
      (map, symbol) in the trees where hist_entries and
      callchain_lists are present, to allow precise annotation instead
      of looking for the first symbol with the selected name.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1269459619-982-4-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      59fd5306
  28. 16 12月, 2009 1 次提交
    • A
      perf diff: Use perf_session__fprintf_hists just like 'perf record' · c351c281
      Arnaldo Carvalho de Melo 提交于
      That means that almost everything you can do with 'perf report'
      can be done with 'perf diff', for instance:
      
      $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2699
      samples) ] $ perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2687
      samples) ] perf diff | head -8
           9.02%     +1.00%     find  libc-2.10.1.so               [.] _IO_vfprintf_internal
           2.91%     -1.00%     find  [kernel]                     [k] __kmalloc
           2.85%     -1.00%     find  [kernel]                     [k] ext4_htree_store_dirent
           1.99%     -1.00%     find  [kernel]                     [k] _atomic_dec_and_lock
           2.44%                find  [kernel]                     [k] half_md4_transform
      $
      
      So if you want to zoom into libc:
      
      $ perf diff --dsos libc-2.10.1.so | head -8
          37.34%                find  [.] _IO_vfprintf_internal
          10.34%                find  [.] __GI_memmove
           8.25%     +2.00%     find  [.] _int_malloc
           5.07%     -1.00%     find  [.] __GI_mempcpy
           7.62%     +2.00%     find  [.] _int_free
      $
      
      And if there were multiple commands using libc, it is also
      possible to aggregate them all by using --sort symbol:
      
      $ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%             [.] _IO_vfprintf_internal
          10.34%             [.] __GI_memmove
           8.25%     +2.00%  [.] _int_malloc
           5.07%     -1.00%  [.] __GI_mempcpy
           7.62%     +2.00%  [.] _int_free
      $
      
      The displacement column now is off by default, to use it:
      
      perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
          37.34%                   [.] _IO_vfprintf_internal
          10.34%                   [.] __GI_memmove
           8.25%     +2.00%        [.] _int_malloc
           5.07%     -1.00%    +2  [.] __GI_mempcpy
           7.62%     +2.00%    -1  [.] _int_free
      $
      
      Using -t/--field-separator can be used for scripting:
      
      $ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
      37.34, , ,[.] _IO_vfprintf_internal
      10.34, , ,[.] __GI_memmove
      8.25,+2.00%, ,[.] _int_malloc
      5.07,-1.00%,  +2,[.] __GI_mempcpy
      7.62,+2.00%,  -1,[.] _int_free
      6.99,+1.00%,  -1,[.] _IO_new_file_xsputn
      1.89,-2.00%,  +4,[.] __readdir64
      $
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c351c281
  29. 15 12月, 2009 2 次提交
    • A
      perf diff: Introduce tool to show performance difference · 86a9eee0
      Arnaldo Carvalho de Melo 提交于
      I guess it is enough to show some examples:
      
      [root@doppio linux-2.6-tip]# rm -f perf.data*
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      ls: cannot access perf.data*: No such file or directory
      [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ]
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      -rw------- 1 root root 74440 2009-12-14 20:03 perf.data
      [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ]
      [root@doppio linux-2.6-tip]# ls -la perf.data*
      -rw------- 1 root root 74280 2009-12-14 20:03 perf.data
      -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old
      [root@doppio linux-2.6-tip]# perf diff | head -5
         1        -34994580     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        -15307806         [kernel.kallsyms]   __kmalloc
         3    +1   +3665941     /lib64/libc-2.10.1.so   __GI_memmove
         4    +4  +23508995     /lib64/libc-2.10.1.so   _int_malloc
         5    +7  +38538813         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -p | head -5
         1        +1.00%     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2                       [kernel.kallsyms]   __kmalloc
         3    +1             /lib64/libc-2.10.1.so   __GI_memmove
         4    +4             /lib64/libc-2.10.1.so   _int_malloc
         5    +7  -1.00%         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -v | head -5
         1        361449551 326454971 -34994580     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        151009241 135701435 -15307806         [kernel.kallsyms]   __kmalloc
         3    +1  101805328 105471269  +3665941     /lib64/libc-2.10.1.so   __GI_memmove
         4    +4   78041440 101550435 +23508995     /lib64/libc-2.10.1.so   _int_malloc
         5    +7   59536172  98074985 +38538813         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]# perf diff -vp | head -5
         1        9.00% 8.00% +1.00%     /lib64/libc-2.10.1.so   _IO_vfprintf_internal
         2        3.00% 3.00%                [kernel.kallsyms]   __kmalloc
         3    +1  2.00% 2.00%            /lib64/libc-2.10.1.so   __GI_memmove
         4    +4  2.00% 2.00%            /lib64/libc-2.10.1.so   _int_malloc
         5    +7  1.00% 2.00% -1.00%         [kernel.kallsyms]   __d_lookup
      [root@doppio linux-2.6-tip]#
      
      This should be enough for diffs where the system is non
      volatile, i.e. when one doesn't updates binaries.
      
      For volatile environments, stay tuned for the next perf tool
      feature: a buildid cache populated by 'perf record', managed by
      'perf buildid-cache' a-la ccache, and used by all the report
      tools.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <1260828571-3613-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86a9eee0
    • A
      perf util: Remove setup_sorting dups · c8829c7a
      Arnaldo Carvalho de Melo 提交于
      And it is also needed by 'perf diff'.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260828571-3613-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8829c7a