1. 04 10月, 2013 1 次提交
  2. 23 7月, 2013 1 次提交
  3. 13 7月, 2013 2 次提交
    • J
      perf hists: Marking dummy hists entries · e0af43d2
      Jiri Olsa 提交于
      It does not make sense to make some computation (ratio, wdiff), when the
      hist_entry is 'dummy' - added via hists__link.
      
      Adding dummy field to struct hist_entry which indicates that it was
      added by hists__link and avoiding some of the processing for such
      entries.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-g8bxml0n0pnqsrpyd98p0ird@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0af43d2
    • G
      perf report/top: Add option to collapse undesired parts of call graph · b21484f1
      Greg Price 提交于
      For example, in an application with an expensive function implemented
      with deeply nested recursive calls, the default call-graph presentation
      is dominated by the different callchains within that function.  By
      ignoring these callees, we can collect the callchains leading into the
      function and compactly identify what to blame for expensive calls.
      
      For example, in this report the callers of garbage_collect() are
      scattered across the tree:
      
        $ perf report -d ruby 2>- | grep -m10 ^[^#]*[a-z]
            22.03%     ruby  [.] gc_mark
                       --- gc_mark
                          |--59.40%-- mark_keyvalue
                          |          st_foreach
                          |          gc_mark_children
                          |          |--99.75%-- rb_gc_mark
                          |          |          rb_vm_mark
                          |          |          gc_mark_children
                          |          |          gc_marks
                          |          |          |--99.00%-- garbage_collect
      
      If we ignore the callees of garbage_collect(), its callers are coalesced:
      
        $ perf report --ignore-callees garbage_collect -d ruby 2>- | grep -m10 ^[^#]*[a-z]
            72.92%     ruby  [.] garbage_collect
                       --- garbage_collect
                           vm_xmalloc
                          |--47.08%-- ruby_xmalloc
                          |          st_insert2
                          |          rb_hash_aset
                          |          |--98.45%-- features_index_add
                          |          |          rb_provide_feature
                          |          |          rb_require_safe
                          |          |          vm_call_method
      Signed-off-by: NGreg Price <price@mit.edu>
      Tested-by: NJiri Olsa <jolsa@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20130623031720.GW22203@biohazard-cafe.mit.edu
      Link: http://lkml.kernel.org/r/20130708115746.GO22203@biohazard-cafe.mit.edu
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      [ remove spaces at beginning of line, reported by Fengguang Wu ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b21484f1
  4. 28 5月, 2013 4 次提交
  5. 01 4月, 2013 2 次提交
    • S
      perf tools: Add mem access sampling core support · 98a3b32c
      Stephane Eranian 提交于
      This patch adds the sorting and histogram support
      functions to enable profiling of memory accesses.
      
      The following sorting orders are added:
       - symbol_daddr: data address symbol (or raw address)
       - dso_daddr: data address shared object
       - locked: access uses locked transaction
       - tlb : TLB access
       - mem : memory level of the access (L1, L2, L3, RAM, ...)
       - snoop: access snoop mode
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1359040242-8269-12-git-send-email-eranian@google.com
      [ committer note: changed to cope with fc5871ed, the move of methods to
        machine.[ch], and the rename of dsrc to data_src, to match the change
        made in the PERF_SAMPLE_DSRC in a previous patch. ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      98a3b32c
    • A
      perf tools: Add support for weight v7 (modified) · 05484298
      Andi Kleen 提交于
      perf record has a new option -W that enables weightened sampling.
      
      Add sorting support in top/report for the average weight per sample and the
      total weight sum. This allows to both compare relative cost per event
      and the total cost over the measurement period.
      
      Add the necessary glue to perf report, record and the library.
      
      v2: Merge with new hist refactoring.
      v3: Fix manpage. Remove value check.
      Rename global_weight to weight and weight to local_weight.
      v4: Readd sort keys to manpage
      v5: Move weight to end
      v6: Move weight to template
      v7: Rename weight key.
      
      Original patch from Andi modified by Stephane Eranian <eranian@google.com>
      to include ONLY the weight supporting code and apply to pristine 3.8.0-rc4.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1359040242-8269-6-git-send-email-eranian@google.com
      [ committer note: changed to cope with fc5871ed and the hists_link perf test entry ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05484298
  6. 07 2月, 2013 1 次提交
  7. 25 1月, 2013 1 次提交
  8. 09 12月, 2012 2 次提交
  9. 09 11月, 2012 1 次提交
  10. 06 10月, 2012 2 次提交
    • J
      perf diff: Add weighted diff computation way to compare hist entries · 81d5f958
      Jiri Olsa 提交于
      Adding 'wdiff' as new computation way to compare hist entries.
      
      If specified the 'Weighted diff' column is displayed with value 'd'
      computed as:
      
         d = B->period * WEIGHT-A - A->period * WEIGHT-B
      
        - A/B being matching hist entry from first/second file specified
          (or perf.data/perf.data.old) respectively.
        - period being the hist entry period value
        - WEIGHT-A/WEIGHT-B being user suplied weights in the the '-c' option
          behind ':' separator like '-c wdiff:1,2'.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349448287-18919-5-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      81d5f958
    • J
      perf diff: Add option to sort entries based on diff computation · 96c47f19
      Jiri Olsa 提交于
      Adding support to sort hist entries based on the outcome of selected
      computation. It's now possible to specify '+' as a first character of
      '-c' option value to make such sort.
      
      Example:
      
        $ perf diff -c ratio -b
        # Event 'cache-misses'
        #
        #   Baseline           Ratio      Shared Object                            Symbol
        #   ........  ..............  .................  ................................
        #
              19.64%            0.69  [kernel.kallsyms]  [k] clear_page
               0.30%            0.17  [kernel.kallsyms]  [k] mm_alloc
               0.04%            0.20  [kernel.kallsyms]  [k] kmem_cache_alloc
      
        $ perf diff -c +ratio -b
        # Event 'cache-misses'
        #
        #   Baseline           Ratio      Shared Object                            Symbol
        #   ........  ..............  .................  ................................
        #
              19.64%            0.69  [kernel.kallsyms]  [k] clear_page
               0.04%            0.20  [kernel.kallsyms]  [k] kmem_cache_alloc
               0.30%            0.17  [kernel.kallsyms]  [k] mm_alloc
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1349448287-18919-4-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      96c47f19
  11. 05 10月, 2012 3 次提交
  12. 18 9月, 2012 1 次提交
  13. 08 9月, 2012 1 次提交
  14. 20 6月, 2012 1 次提交
    • A
      perf tools: Add sort by src line/number · 409a8be6
      Arnaldo Carvalho de Melo 提交于
      Using addr2line for now, requires debuginfo, needs more work to support
      detached debuginfo, aka foo-debuginfo packages.
      
      Example:
      
      	[root@sandy ~]# perf record -a sleep 3
      	[ perf record: Woken up 1 times to write data ]
      	[ perf record: Captured and wrote 0.555 MB perf.data (~24236 samples) ]
      	[root@sandy ~]# perf report -s dso,srcline 2>&1 | grep -v ^# | head -5
      	    22.41%  [kernel.kallsyms]  /home/git/linux/drivers/idle/intel_idle.c:280
      	     4.79%  [kernel.kallsyms]  /home/git/linux/drivers/cpuidle/cpuidle.c:148
      	     4.78%  [kernel.kallsyms]  /home/git/linux/arch/x86/include/asm/atomic64_64.h:121
      	     4.49%  [kernel.kallsyms]  /home/git/linux/kernel/sched/core.c:1690
      	     4.30%  [kernel.kallsyms]  /home/git/linux/include/linux/seqlock.h:90
      	[root@sandy ~]#
      
      [root@sandy ~]# perf top -U -s dso,symbol,srcline
      Samples: 1K of event 'cycles', Event count (approx.): 589617389
       18.66%  [kernel]  [k] copy_user_generic_unrolled   /home/git/linux/arch/x86/lib/copy_user_64.S:143
        7.83%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:39
        6.59%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:38
        3.66%  [kernel]  [k] page_fault                   /home/git/linux/arch/x86/kernel/entry_64.S:1379
        3.25%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:40
        3.12%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:37
        2.74%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:36
        2.39%  [kernel]  [k] clear_page                   /home/git/linux/arch/x86/lib/clear_page_64.S:43
        2.12%  [kernel]  [k] ioread32                     /home/git/linux/lib/iomap.c:90
        1.51%  [kernel]  [k] copy_user_generic_unrolled   /home/git/linux/arch/x86/lib/copy_user_64.S:144
        1.19%  [kernel]  [k] copy_user_generic_unrolled   /home/git/linux/arch/x86/lib/copy_user_64.S:154
      Suggested-by: NAndi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-pdmqbng9twz06jzkbgtuwbp8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      409a8be6
  15. 09 3月, 2012 3 次提交
  16. 13 10月, 2011 1 次提交
  17. 07 10月, 2011 1 次提交
    • A
      perf hists: Threaded addition and sorting of entries · 1980c2eb
      Arnaldo Carvalho de Melo 提交于
      By using a mutex just for inserting and rotating two hist_entry rb
      trees, so that when sorting we can get the last batch of entries created
      from the ring buffer, merge it with whatever we have processed so far
      and show the output while new entries are being added.
      
      The 'report' tool continues, for now, to do it without threading, but
      will use this in the future to allow visualization of results in long
      perf.data sessions while the entries are being processed.
      
      The new 'top' tool will be the first user.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-9b05atsn0q6m7fqgrug8fk2i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1980c2eb
  18. 30 6月, 2011 2 次提交
  19. 23 8月, 2010 1 次提交
    • F
      perf: Keep track of the max depth of a callchain · d2009c51
      Frederic Weisbecker 提交于
      In order to implement callchains collapsing, we need to keep
      track of the maximum depth in a histogram tree of callchains.
      This way we'll avoid allocating an arbitrary temporary buffer
      size on callchain merge time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      d2009c51
  20. 27 7月, 2010 1 次提交
    • A
      perf ui: New hists tree widget · 0f0cbf7a
      Arnaldo Carvalho de Melo 提交于
      The stock newt checkbox tree widget we were using was not really
      suitable for hist entry + callchain browsing.
      
      The problems with it were manifold:
      
      - We needed to traverse the whole hist_entry rb_tree to add each entry +
        callchains beforehand.
      
      - No control over the colors used for each row
      
      So a new tree widget, based mostly on slang, was written.
      
      It extends the ui_browser class already used for annotate to allow the
      user to fold/unfold branches in the callchains tree, using extra fields
      in the symbol_map class that is embedded in hist_entry and
      callchain_node instances to store the folding state and when changing
      this state calculates the number of rows that are produced when showing
      a particular hist_entry instance.
      
      This greatly speeds up browsing as we don't have to upfront touch all
      the entries and only calculate callchain related operations when some
      callchain branch is actually unfolded.
      
      The memory footprint is also reduced as the data structure is not
      duplicated, just some extra fields for controling callchain state and to
      simplify the process of seeking thru entries (nr_rows, row_offset) were
      added.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f0cbf7a
  21. 23 7月, 2010 1 次提交
  22. 05 6月, 2010 1 次提交
    • A
      perf report: Implement --sort cpu · f60f3593
      Arun Sharma 提交于
      In a shared multi-core environment, users want to analyze why their
      program was slow. In particular, if the code ran slower only on certain
      CPUs due to interference from other programs or kernel threads, the user
      should be able to notice that.
      
      Sample usage:
      
      perf record -f -a -- sleep 3
      perf report --sort cpu,comm
      
      Workload:
      
      program is running on 16 CPUs
      Experiencing interference from an antagonist only on 4 CPUs.
      
        Samples: 106218177676 cycles
      
        Overhead  CPU          Command
        ........  ...  ...............
      
           6.25%  2            program
           6.24%  6            program
           6.24%  11           program
           6.24%  5            program
           6.24%  9            program
           6.24%  10           program
           6.23%  15           program
           6.23%  7            program
           6.23%  3            program
           6.23%  14           program
           6.22%  1            program
           6.20%  13           program
           3.17%  12           program
           3.15%  8            program
           3.14%  0            program
           3.13%  4            program
           3.11%  4         antagonist
           3.11%  0         antagonist
           3.10%  8         antagonist
           3.07%  12        antagonist
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <20100505181612.GA5091@sharma-home.net>
      Signed-off-by: NArun Sharma <aruns@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f60f3593
  23. 18 5月, 2010 1 次提交
    • A
      perf options: Type check all the remaining OPT_ variants · edb7c60e
      Arnaldo Carvalho de Melo 提交于
      OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these
      tools is to set something that has an enum type, that is builtin
      compatible with unsigned int.
      
      Several string constifications were done to make OPT_STRING require a
      const char * type.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      edb7c60e
  24. 15 5月, 2010 1 次提交
    • A
      perf report: Report number of events, not samples · c82ee828
      Arnaldo Carvalho de Melo 提交于
      Number of samples is meaningless after we switched to auto-freq, so
      report the number of events, i.e. not the sum of the different periods,
      but the number PERF_RECORD_SAMPLE emitted by the kernel.
      
      While doing this I noticed that naming "count" to the sum of all the
      event periods can be confusing, so rename it to .period, just like in
      struct sample.data, so that we become more consistent.
      
      This helps with the next step, that was to record in struct hist_entry
      the number of sample events for each instance, we need that because we
      use it to generate the number of events when applying filters to the
      tree of hist entries like it is being done in the TUI report browser.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c82ee828
  25. 12 5月, 2010 1 次提交
    • A
      perf report: Librarize the annotation code and use it in the newt browser · ef7b93a1
      Arnaldo Carvalho de Melo 提交于
      Now we don't anymore use popen to run 'perf annotate' for the selected
      symbol, instead we collect per address samplings when processing samples
      in 'perf report' if we're using the newt browser, then we use this data
      directly to do annotation.
      
      Done this way we can actually traverse the objdump_line objects
      directly, matching the addresses to the collected samples and colouring
      them appropriately using lower level slang routines.
      
      The new ui_browser class will be reused for the main, callchain aware,
      histogram browser, when it will be made generic and don't assume that
      the objects are always instances of the objdump_line class maintained
      using list_heads.
      
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef7b93a1
  26. 19 4月, 2010 1 次提交
  27. 15 4月, 2010 1 次提交
    • F
      perf tools: Fix accidentally preprocessed snprintf callback · fcd14984
      Frederic Weisbecker 提交于
      struct sort_entry has a callback named snprintf that turns an
      entry into a string result.
      But there are glibc versions that implement snprintf through a
      macro. The following expression is then going to get the snprintf
      call preprocessed:
      
              ent->snprintf(...)
      
      to finally end up in a build error:
      
              util/hist.c: Dans la fonction «hist_entry__snprintf» :
              util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk»
      
      To fix this, prepend struct sort_entry callbacks with an "se_"
      prefix.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fcd14984
  28. 04 4月, 2010 1 次提交
    • A
      perf TUI: Add a "Zoom into COMM(PID) thread" and reverse operations · a5e29aca
      Arnaldo Carvalho de Melo 提交于
      Now one can press the right arrow key and in addition to being able to
      filter by DSO, filter out by thread too, or a combination of both
      filters.
      
      With this one can start collecting events for the whole system, then
      focus on a subset of the collected data quickly.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a5e29aca