1. 24 3月, 2016 1 次提交
  2. 08 1月, 2016 1 次提交
  3. 24 11月, 2015 1 次提交
  4. 20 11月, 2015 5 次提交
  5. 23 10月, 2015 4 次提交
  6. 09 8月, 2015 1 次提交
  7. 06 8月, 2015 1 次提交
  8. 06 5月, 2015 1 次提交
  9. 19 2月, 2015 1 次提交
    • K
      perf tools: Enable LBR call stack support · aad2b21c
      Kan Liang 提交于
      Currently, there are two call chain recording options, fp and dwarf.
      
      Haswell has a new feature that utilizes the existing LBR facility to
      record call chains. Kernel side LBR support code provides this as a
      third option to record call chains. This patch enables the lbr call
      stack support on the tooling side.
      
      LBR call stack has some limitations:
      
       - It reuses current LBR facility, so LBR call stack and branch record
         can not be enabled at the same time.
      
       - It is only available for user-space callchains.
      
      However, it also offers some advantages:
      
       - LBR call stack can work on user apps which don't have frame-pointers
         or dwarf debug info compiled. It is a good alternative when nothing
         else works.
      Tested-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masanari Iida <standby24x7@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rodrigo Campos <rodrigo@sdfg.com.ar>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1420482185-29830-2-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aad2b21c
  10. 08 1月, 2015 1 次提交
  11. 02 12月, 2014 1 次提交
    • A
      perf callchain: Support handling complete branch stacks as histograms · 8b7bad58
      Andi Kleen 提交于
      Currently branch stacks can be only shown as edge histograms for
      individual branches. I never found this display particularly useful.
      
      This implements an alternative mode that creates histograms over
      complete branch traces, instead of individual branches, similar to how
      normal callgraphs are handled. This is done by putting it in front of
      the normal callgraph and then using the normal callgraph histogram
      infrastructure to unify them.
      
      This way in complex functions we can understand the control flow that
      lead to a particular sample, and may even see some control flow in the
      caller for short functions.
      
      Example (simplified, of course for such simple code this is usually not
      needed), please run this after the whole patchkit is in, as at this
      point in the patch order there is no --branch-history, that will be
      added in a patch after this one:
      
      tcall.c:
      
      volatile a = 10000, b = 100000, c;
      
      __attribute__((noinline)) f2()
      {
      	c = a / b;
      }
      
      __attribute__((noinline)) f1()
      {
      	f2();
      	f2();
      }
      main()
      {
      	int i;
      	for (i = 0; i < 1000000; i++)
      		f1();
      }
      
      % perf record -b -g ./tsrc/tcall
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
      % perf report --no-children --branch-history
      ...
          54.91%  tcall.c:6  [.] f2                      tcall
                  |
                  |--65.53%-- f2 tcall.c:5
                  |          |
                  |          |--70.83%-- f1 tcall.c:11
                  |          |          f1 tcall.c:10
                  |          |          main tcall.c:18
                  |          |          main tcall.c:18
                  |          |          main tcall.c:17
                  |          |          main tcall.c:17
                  |          |          f1 tcall.c:13
                  |          |          f1 tcall.c:13
                  |          |          f2 tcall.c:7
                  |          |          f2 tcall.c:5
                  |          |          f1 tcall.c:12
                  |          |          f1 tcall.c:12
                  |          |          f2 tcall.c:7
                  |          |          f2 tcall.c:5
                  |          |          f1 tcall.c:11
                  |          |
                  |           --29.17%-- f1 tcall.c:12
                  |                     f1 tcall.c:12
                  |                     f2 tcall.c:7
                  |                     f2 tcall.c:5
                  |                     f1 tcall.c:11
                  |                     f1 tcall.c:10
                  |                     main tcall.c:18
                  |                     main tcall.c:18
                  |                     main tcall.c:17
                  |                     main tcall.c:17
                  |                     f1 tcall.c:13
                  |                     f1 tcall.c:13
                  |                     f2 tcall.c:7
                  |                     f2 tcall.c:5
                  |                     f1 tcall.c:12
      
      The default output is unchanged.
      
      This is only implemented in perf report, no change to record or anywhere
      else.
      
      This adds the basic code to report:
      
      - add a new "branch" option to the -g option parser to enable this mode
      - when the flag is set include the LBR into the callstack in machine.c.
      
      The rest of the history code is unchanged and doesn't know the
      difference between LBR entry and normal call entry.
      
      - detect overlaps with the callchain
      - remove small loop duplicates in the LBR
      
      Current limitations:
      
      - The LBR flags (mispredict etc.) are not shown in the history
      and LBR entries have no special marker.
      - It would be nice if annotate marked the LBR entries somehow
      (e.g. with arrows)
      
      v2: Various fixes.
      v3: Merge further patches into this one. Fix white space.
      v4: Improve manpage. Address review feedback.
      v5: Rename functions. Better error message without -g. Fix crash without
          -b.
      v6: Rebase
      v7: Rebase. Use NO_ENTRY in memset.
      v8: Port to latest tip. Move add_callchain_ip to separate
          patch. Skip initial entries in callchain. Minor cleanups.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b7bad58
  12. 25 11月, 2014 1 次提交
  13. 19 11月, 2014 1 次提交
  14. 29 10月, 2014 1 次提交
  15. 15 10月, 2014 1 次提交
  16. 26 9月, 2014 3 次提交
  17. 27 6月, 2014 1 次提交
    • S
      perf tools powerpc: Adjust callchain based on DWARF debug info · a60335ba
      Sukadev Bhattiprolu 提交于
      When saving the callchain on Power, the kernel conservatively saves excess
      entries in the callchain. A few of these entries are needed in some cases
      but not others. We should use the DWARF debug information to determine
      when the entries are  needed.
      
      Eg: the value in the link register (LR) is needed only when it holds the
      return address of a function. At other times it must be ignored.
      
      If the unnecessary entries are not ignored, we end up with duplicate arcs
      in the call-graphs.
      
      Use the DWARF debug information to determine if any callchain entries
      should be ignored when building call-graphs.
      
      Callgraph before the patch:
      
          14.67%          2234  sprintft  libc-2.18.so       [.] __random
                  |
                  --- __random
                     |
                     |--61.12%-- __random
                     |          |
                     |          |--97.15%-- rand
                     |          |          do_my_sprintf
                     |          |          main
                     |          |          generic_start_main.isra.0
                     |          |          __libc_start_main
                     |          |          0x0
                     |          |
                     |           --2.85%-- do_my_sprintf
                     |                     main
                     |                     generic_start_main.isra.0
                     |                     __libc_start_main
                     |                     0x0
                     |
                      --38.88%-- rand
                                |
                                |--94.01%-- rand
                                |          do_my_sprintf
                                |          main
                                |          generic_start_main.isra.0
                                |          __libc_start_main
                                |          0x0
                                |
                                 --5.99%-- do_my_sprintf
                                           main
                                           generic_start_main.isra.0
                                           __libc_start_main
                                           0x0
      
      Callgraph after the patch:
      
          14.67%          2234  sprintft  libc-2.18.so       [.] __random
                  |
                  --- __random
                     |
                     |--95.93%-- rand
                     |          do_my_sprintf
                     |          main
                     |          generic_start_main.isra.0
                     |          __libc_start_main
                     |          0x0
                     |
                      --4.07%-- do_my_sprintf
                                main
                                generic_start_main.isra.0
                                __libc_start_main
                                0x0
      
      TODO:	For split-debug info objects like glibc, we can only determine
      	the call-frame-address only when both .eh_frame and .debug_info
      	sections are available. We should be able to determin the CFA
      	even without the .eh_frame section.
      
      Fix suggested by Anton Blanchard.
      
      Thanks to valuable input on DWARF debug information from Ulrich Weigand.
      Reported-by: NMaynard Johnson <maynard@us.ibm.com>
      Tested-by: NMaynard Johnson <maynard@us.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      a60335ba
  18. 01 6月, 2014 2 次提交
  19. 05 5月, 2014 1 次提交
  20. 22 4月, 2014 1 次提交
  21. 16 1月, 2014 1 次提交
  22. 20 12月, 2013 1 次提交
  23. 29 10月, 2013 1 次提交
  24. 22 10月, 2013 1 次提交
    • N
      perf callchain: Convert children list to rbtree · e369517c
      Namhyung Kim 提交于
      Current collapse stage has a scalability problem which can be reproduced
      easily with a parallel kernel build.
      
      This is because it needs to traverse every children of callchains
      linearly during the collapse/merge stage.
      
      Converting it to a rbtree reduced the overhead significantly.
      
      On my 400MB perf.data file which recorded with make -j32 kernel build:
      
        $ time perf --no-pager report --stdio > /dev/null
      
      before:
        real	6m22.073s
        user	6m18.683s
        sys	0m0.706s
      
      after:
        real	0m20.780s
        user	0m19.962s
        sys	0m0.689s
      
      During the perf report the overhead on append_chain_children went down
      from 96.69% to 18.16%:
      
        -  18.16%  perf  perf                [.] append_chain_children
           - append_chain_children
              - 77.48% append_chain_children
                 + 69.79% merge_chain_branch
                 - 22.96% append_chain_children
                    + 67.44% merge_chain_branch
                    + 30.15% append_chain_children
                    + 2.41% callchain_append
                 + 7.25% callchain_append
              + 12.26% callchain_append
              + 10.22% merge_chain_branch
        +  11.58%  perf  perf                [.] dso__find_symbol
        +   8.02%  perf  perf                [.] sort__comm_cmp
        +   5.48%  perf  libc-2.17.so        [.] malloc_consolidate
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1381468543-25334-2-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e369517c
  25. 30 8月, 2013 1 次提交
  26. 22 7月, 2013 1 次提交
  27. 12 12月, 2012 1 次提交
  28. 01 9月, 2012 1 次提交
  29. 31 5月, 2012 1 次提交
  30. 28 11月, 2011 1 次提交