1. 15 11月, 2016 2 次提交
    • J
      perf report: Calculate and return the branch flag counting · 3dd029ef
      Jin Yao 提交于
      Create some branch counters in per callchain list entry. Each counter
      is for a branch flag. For example, predicted_count counts all the
      *predicted* branches. The counters get updated by processing the
      callchain cursor nodes.
      
      It also provides functions to retrieve or print the values of counters
      in callchain list.
      
      Besides the counting for branch flags, it also counts and returns the
      average number of iterations.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linux-kernel@vger.kernel.org
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/1477876794-30749-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3dd029ef
    • J
      perf report: Add branch flag to callchain cursor node · 410024db
      Jin Yao 提交于
      Since the branch ip has been added to call stack for easier browsing,
      this patch adds more branch information. For example, add a flag to
      indicate if this ip is a branch, and also add with the branch flag.
      
      Then we can know if the cursor node represents a branch and know what
      the branch flag it has.
      
      The branch history code has a loop detection pass that removes loops. It
      would be nice for knowing how many loops were removed then in next
      steps, we can compute out the average number of iterations.
      
      For example:
      
      Before remove_loops(),
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x300, to = 0x250
      entry3: from = 0x300, to = 0x250
      entry4: from = 0x700, to = 0x800
      
      After remove_loops()
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x700, to = 0x800
      
      The original entry2 and entry3 are removed. So the number of iterations
      (from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
      
      iterations = removed number + 1;
      average iteractions = Sum(iteractions) / number of samples
      
      This formula ignores other cases, for example, iterations cross multiple
      buffers and one buffer contains 2+ loops. Because in practice, it's good
      enough.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linux-kernel@vger.kernel.org
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
      [ Renamed 'iter' to 'nr_loop_iter' for clarity ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      410024db
  2. 08 11月, 2016 1 次提交
  3. 05 7月, 2016 1 次提交
  4. 30 5月, 2016 1 次提交
    • A
      perf tools: Per event max-stack settings · 792d48b4
      Arnaldo Carvalho de Melo 提交于
      The tooling counterpart, now it is possible to do:
      
        # perf record -e sched:sched_switch/max-stack=10/ -e cycles/call-graph=dwarf,max-stack=4/ -e cpu-cycles/call-graph=dwarf,max-stack=1024/ usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.052 MB perf.data (5 samples) ]
        # perf evlist -v
        sched:sched_switch: type: 2, size: 112, config: 0x110, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, sample_max_stack: 10
        cycles/call-graph=dwarf,max-stack=4/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 4
        cpu-cycles/call-graph=dwarf,max-stack=1024/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 1024
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
      Using just /max-stack=N/ means /call-graph=fp,max-stack=N/, that should
      be further configurable by means of some .perfconfig knob.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      792d48b4
  5. 16 4月, 2016 1 次提交
  6. 15 4月, 2016 1 次提交
    • A
      perf callchain: Start moving away from global per thread cursors · 91d7b2de
      Arnaldo Carvalho de Melo 提交于
      The recent perf_evsel__fprintf_callchain() move to evsel.c added several
      new symbol requirements to the python binding, for instance:
      
        # perf test -v python
        16: Try 'import perf' in python, checking link problems      :
        --- start ---
        test child forked, pid 18030
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
        callchain_cursor
        test child finished with -1
        ---- end ----
        Try 'import perf' in python, checking link problems: FAILED!
        #
      
      This would require linking against callchain.c to access to the global
      callchain_cursor variables.
      
      Since lots of functions already receive as a parameter a
      callchain_cursor struct pointer, make that be the case for some more
      function so that we can start phasing out usage of yet another global
      variable.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-djko3097eyg2rn66v2qcqfvn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91d7b2de
  7. 24 3月, 2016 1 次提交
  8. 08 1月, 2016 1 次提交
  9. 24 11月, 2015 1 次提交
  10. 20 11月, 2015 5 次提交
  11. 23 10月, 2015 4 次提交
  12. 09 8月, 2015 1 次提交
  13. 06 8月, 2015 1 次提交
  14. 06 5月, 2015 1 次提交
  15. 19 2月, 2015 1 次提交
    • K
      perf tools: Enable LBR call stack support · aad2b21c
      Kan Liang 提交于
      Currently, there are two call chain recording options, fp and dwarf.
      
      Haswell has a new feature that utilizes the existing LBR facility to
      record call chains. Kernel side LBR support code provides this as a
      third option to record call chains. This patch enables the lbr call
      stack support on the tooling side.
      
      LBR call stack has some limitations:
      
       - It reuses current LBR facility, so LBR call stack and branch record
         can not be enabled at the same time.
      
       - It is only available for user-space callchains.
      
      However, it also offers some advantages:
      
       - LBR call stack can work on user apps which don't have frame-pointers
         or dwarf debug info compiled. It is a good alternative when nothing
         else works.
      Tested-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masanari Iida <standby24x7@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rodrigo Campos <rodrigo@sdfg.com.ar>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1420482185-29830-2-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aad2b21c
  16. 08 1月, 2015 1 次提交
  17. 02 12月, 2014 1 次提交
    • A
      perf callchain: Support handling complete branch stacks as histograms · 8b7bad58
      Andi Kleen 提交于
      Currently branch stacks can be only shown as edge histograms for
      individual branches. I never found this display particularly useful.
      
      This implements an alternative mode that creates histograms over
      complete branch traces, instead of individual branches, similar to how
      normal callgraphs are handled. This is done by putting it in front of
      the normal callgraph and then using the normal callgraph histogram
      infrastructure to unify them.
      
      This way in complex functions we can understand the control flow that
      lead to a particular sample, and may even see some control flow in the
      caller for short functions.
      
      Example (simplified, of course for such simple code this is usually not
      needed), please run this after the whole patchkit is in, as at this
      point in the patch order there is no --branch-history, that will be
      added in a patch after this one:
      
      tcall.c:
      
      volatile a = 10000, b = 100000, c;
      
      __attribute__((noinline)) f2()
      {
      	c = a / b;
      }
      
      __attribute__((noinline)) f1()
      {
      	f2();
      	f2();
      }
      main()
      {
      	int i;
      	for (i = 0; i < 1000000; i++)
      		f1();
      }
      
      % perf record -b -g ./tsrc/tcall
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
      % perf report --no-children --branch-history
      ...
          54.91%  tcall.c:6  [.] f2                      tcall
                  |
                  |--65.53%-- f2 tcall.c:5
                  |          |
                  |          |--70.83%-- f1 tcall.c:11
                  |          |          f1 tcall.c:10
                  |          |          main tcall.c:18
                  |          |          main tcall.c:18
                  |          |          main tcall.c:17
                  |          |          main tcall.c:17
                  |          |          f1 tcall.c:13
                  |          |          f1 tcall.c:13
                  |          |          f2 tcall.c:7
                  |          |          f2 tcall.c:5
                  |          |          f1 tcall.c:12
                  |          |          f1 tcall.c:12
                  |          |          f2 tcall.c:7
                  |          |          f2 tcall.c:5
                  |          |          f1 tcall.c:11
                  |          |
                  |           --29.17%-- f1 tcall.c:12
                  |                     f1 tcall.c:12
                  |                     f2 tcall.c:7
                  |                     f2 tcall.c:5
                  |                     f1 tcall.c:11
                  |                     f1 tcall.c:10
                  |                     main tcall.c:18
                  |                     main tcall.c:18
                  |                     main tcall.c:17
                  |                     main tcall.c:17
                  |                     f1 tcall.c:13
                  |                     f1 tcall.c:13
                  |                     f2 tcall.c:7
                  |                     f2 tcall.c:5
                  |                     f1 tcall.c:12
      
      The default output is unchanged.
      
      This is only implemented in perf report, no change to record or anywhere
      else.
      
      This adds the basic code to report:
      
      - add a new "branch" option to the -g option parser to enable this mode
      - when the flag is set include the LBR into the callstack in machine.c.
      
      The rest of the history code is unchanged and doesn't know the
      difference between LBR entry and normal call entry.
      
      - detect overlaps with the callchain
      - remove small loop duplicates in the LBR
      
      Current limitations:
      
      - The LBR flags (mispredict etc.) are not shown in the history
      and LBR entries have no special marker.
      - It would be nice if annotate marked the LBR entries somehow
      (e.g. with arrows)
      
      v2: Various fixes.
      v3: Merge further patches into this one. Fix white space.
      v4: Improve manpage. Address review feedback.
      v5: Rename functions. Better error message without -g. Fix crash without
          -b.
      v6: Rebase
      v7: Rebase. Use NO_ENTRY in memset.
      v8: Port to latest tip. Move add_callchain_ip to separate
          patch. Skip initial entries in callchain. Minor cleanups.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b7bad58
  18. 25 11月, 2014 1 次提交
  19. 19 11月, 2014 1 次提交
  20. 29 10月, 2014 1 次提交
  21. 15 10月, 2014 1 次提交
  22. 26 9月, 2014 3 次提交
  23. 27 6月, 2014 1 次提交
    • S
      perf tools powerpc: Adjust callchain based on DWARF debug info · a60335ba
      Sukadev Bhattiprolu 提交于
      When saving the callchain on Power, the kernel conservatively saves excess
      entries in the callchain. A few of these entries are needed in some cases
      but not others. We should use the DWARF debug information to determine
      when the entries are  needed.
      
      Eg: the value in the link register (LR) is needed only when it holds the
      return address of a function. At other times it must be ignored.
      
      If the unnecessary entries are not ignored, we end up with duplicate arcs
      in the call-graphs.
      
      Use the DWARF debug information to determine if any callchain entries
      should be ignored when building call-graphs.
      
      Callgraph before the patch:
      
          14.67%          2234  sprintft  libc-2.18.so       [.] __random
                  |
                  --- __random
                     |
                     |--61.12%-- __random
                     |          |
                     |          |--97.15%-- rand
                     |          |          do_my_sprintf
                     |          |          main
                     |          |          generic_start_main.isra.0
                     |          |          __libc_start_main
                     |          |          0x0
                     |          |
                     |           --2.85%-- do_my_sprintf
                     |                     main
                     |                     generic_start_main.isra.0
                     |                     __libc_start_main
                     |                     0x0
                     |
                      --38.88%-- rand
                                |
                                |--94.01%-- rand
                                |          do_my_sprintf
                                |          main
                                |          generic_start_main.isra.0
                                |          __libc_start_main
                                |          0x0
                                |
                                 --5.99%-- do_my_sprintf
                                           main
                                           generic_start_main.isra.0
                                           __libc_start_main
                                           0x0
      
      Callgraph after the patch:
      
          14.67%          2234  sprintft  libc-2.18.so       [.] __random
                  |
                  --- __random
                     |
                     |--95.93%-- rand
                     |          do_my_sprintf
                     |          main
                     |          generic_start_main.isra.0
                     |          __libc_start_main
                     |          0x0
                     |
                      --4.07%-- do_my_sprintf
                                main
                                generic_start_main.isra.0
                                __libc_start_main
                                0x0
      
      TODO:	For split-debug info objects like glibc, we can only determine
      	the call-frame-address only when both .eh_frame and .debug_info
      	sections are available. We should be able to determin the CFA
      	even without the .eh_frame section.
      
      Fix suggested by Anton Blanchard.
      
      Thanks to valuable input on DWARF debug information from Ulrich Weigand.
      Reported-by: NMaynard Johnson <maynard@us.ibm.com>
      Tested-by: NMaynard Johnson <maynard@us.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      a60335ba
  24. 01 6月, 2014 2 次提交
  25. 05 5月, 2014 1 次提交
  26. 22 4月, 2014 1 次提交
  27. 16 1月, 2014 1 次提交
  28. 20 12月, 2013 1 次提交
  29. 29 10月, 2013 1 次提交