1. 25 3月, 2015 2 次提交
  2. 13 3月, 2015 1 次提交
  3. 02 3月, 2015 2 次提交
  4. 28 2月, 2015 2 次提交
    • M
      perf buildid-cache: Add --purge FILE to remove all caches of FILE · 8d8c8e4c
      Masami Hiramatsu 提交于
      Add --purge FILE to remove all caches of FILE.
      
      Since the current --remove FILE removes a cache which has
      same build-id of given FILE. Since the command takes a
      FILE path, it can confuse user who tries to remove cache
      about FILE path.
      
        -----
        # ./perf buildid-cache -v --add ./perf
        Adding 133b7b5486d987a5ab5c3ebf4ea14941f45d4d4f ./perf: Ok
        # (update the ./perf binary)
        # ./perf buildid-cache -v --remove ./perf
        Removing 305bbd1be68f66eca7e2d78db294653031edfa79 ./perf: FAIL
        ./perf wasn't in the cache
        -----
      Actually, the --remove's FAIL is not shown, it just silently fails.
      
      So, this patch adds --purge FILE action for such usecase.
      
      perf buildid-cache --purge FILE removes all caches which has same FILE
      path.
      
      In other words, it removes all caches including old binaries.
      
        -----
        # ./perf buildid-cache -v --add ./perf
        Adding 133b7b5486d987a5ab5c3ebf4ea14941f45d4d4f ./perf: Ok
        # (update the ./perf binary)
        # ./perf buildid-cache -v --purge ./perf
        Removing 133b7b5486d987a5ab5c3ebf4ea14941f45d4d4f ./perf: Ok
        -----
      
      BTW, if you want to purge all the caches, remove ~/.debug/* .
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20150227045026.1999.64084.stgit@localhost.localdomain
      [ s/dirname/dir_name/g to fix build on fedora14, where dirname is a global ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8d8c8e4c
    • Y
      perf list: Extend raw-dump to certain kind of events · 5ef803ee
      Yunlong Song 提交于
      Extend 'perf list --raw-dump' to 'perf list --raw-dump [hw|sw|cache
      |tracepoint|pmu|event_glob]' in order to show the raw-dump of a certain
      kind of events rather than all of the events.
      
      Example:
      
      Before this patch:
      
       $ perf list --raw-dump hw
       branch-instructions branch-misses bus-cycles cache-misses
       cache-references cpu-cycles instructions stalled-cycles-backend
       stalled-cycles-frontend
       alignment-faults context-switches cpu-clock cpu-migrations
       emulation-faults major-faults minor-faults page-faults task-clock
       ...
       ...
       writeback:writeback_thread_start writeback:writeback_thread_stop
       writeback:writeback_wait_iff_congested
       writeback:writeback_wake_background writeback:writeback_wake_thread
      
      As shown above, all of the events are printed.
      
      After this patch:
      
       $ perf list --raw-dump hw
       branch-instructions branch-misses bus-cycles cache-misses
       cache-references cpu-cycles instructions stalled-cycles-backend
       stalled-cycles-frontend
      
      As shown above, only the hw events are printed.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1425032491-20224-5-git-send-email-yunlong.song@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5ef803ee
  5. 27 2月, 2015 2 次提交
    • K
      perf diff: Support for different binaries · 94ba462d
      Kan Liang 提交于
      Currently, the perf diff only works with same binaries. That's because
      it compares the symbol start address. It doesn't work if the perf.data
      comes from different binaries. This patch matches the symbol names.
      
      Actually, perf diff once intended to compare the symbol names.  The
      commit as below can look for a pair by name.
      
      604c5c92 (perf diff: Change the default sort order to "dso,symbol")
      However, at that time, perf diff used a global list of dsos. That means
      the binaries which has same name can only be loaded once. That's a
      problem for comparing different binaries.
      
      For example, we have an old binary and an updated binary. They very
      likely have same name and most of the functions, so only dsos from old
      binary will be loaded. When processing the data from updated binary,
      perf still use the symbol information from old binary. That's wrong.
      
      Then the commit as below used IP to replace symbol name.
      9c443dfd ("perf diff: Fix support for all --sort combinations")
      >From that time, perf diff starts to compare the symbol address.
      
      The global dsos is discarded from a patch in 2010.
      a1645ce1 ("perf: 'perf kvm' tool for monitoring guest performance
      from host")
      However, at that time, perf diff already compared by address. So perf
      diff cannot work for different binaries as well.
      
      This patch actually rolls back the perf diff to original design. The
      document is also changed, so everybody knows the original design is to
      compare the symbol names.
      
      Here are some examples:
      
      The only difference between example_v1.c and example_v2.c is the
      location of f2 and f3. There is no change in behavior, but the previous
      perf diff display the wrong differential profile.
      
      example_v1.c
      noinline void f3(void)
      {
              volatile int i;
              for (i = 0; i < 10000;) {
      
                      if(i%2)
                              i++;
                      else
                              i++;
              }
      }
      
      noinline void f2(void)
      {
              volatile int a = 100, b, c;
              for (b = 0; b < 10000; b++)
                      c = a * b;
      
      }
      
      noinline void f1(void)
      {
                      f2();
                      f3();
      }
      
      int main()
      {
              int i;
              for (i = 0; i < 100000; i++)
                      f1();
      }
      
      example_v2.c
      noinline void f2(void)
      {
              volatile int a = 100, b, c;
              for (b = 0; b < 10000; b++)
                      c = a * b;
      }
      
      noinline void f3(void)
      {
              volatile int i;
              for (i = 0; i < 10000;) {
                      if(i%2)
                              i++;
                      else
                              i++;
              }
      }
      
      noinline void f1(void)
      {
                      f2();
                      f3();
      }
      
      int main()
      {
              int i;
              for (i = 0; i < 100000; i++)
                      f1();
      }
      
      [lk@localhost perf_diff]$ gcc example_v1.c -o example
      [lk@localhost perf_diff]$ perf record -o example_v1.data ./example
      [ perf record: Woken up 4 times to write data ]
      [ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]
      
      [lk@localhost perf_diff]$ gcc example_v2.c -o example
      [lk@localhost perf_diff]$ perf record -o example_v2.data ./example
      [ perf record: Woken up 4 times to write data ]
      [ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]
      
      Old perf diff result:
      
      [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
       Event 'cycles'
       Baseline    Delta  Shared Object     Symbol
       ........  .......  ................  ...............................
      
                           [kernel.vmlinux]  [k] __perf_event_task_sched_out
           0.00%           [kernel.vmlinux]  [k] apic_timer_interrupt
                           [kernel.vmlinux]  [k] idle_cpu
                           [kernel.vmlinux]  [k] intel_pstate_timer_func
                           [kernel.vmlinux]  [k] native_read_msr_safe
           0.00%           [kernel.vmlinux]  [k] native_read_tsc
           0.00%           [kernel.vmlinux]  [k] native_write_msr_safe
                           [kernel.vmlinux]  [k] ntp_tick_length
           0.00%           [kernel.vmlinux]  [k] rb_erase
           0.00%           [kernel.vmlinux]  [k] tick_sched_timer
           0.00%           [kernel.vmlinux]  [k] unmap_single_vma
           0.00%           [kernel.vmlinux]  [k] update_wall_time
           0.00%           example           [.] f1
          46.24%           example           [.] f2
          53.71%   -7.55%  example           [.] f3
                  +53.81%  example           [.] f3
           0.02%           example           [.] main
      
      New perf diff result:
      
      [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
                           [kernel.vmlinux]  [k] __perf_event_task_sched_out
           0.00%           [kernel.vmlinux]  [k] apic_timer_interrupt
                           [kernel.vmlinux]  [k] idle_cpu
                           [kernel.vmlinux]  [k] intel_pstate_timer_func
                           [kernel.vmlinux]  [k] native_read_msr_safe
           0.00%           [kernel.vmlinux]  [k] native_read_tsc
           0.00%           [kernel.vmlinux]  [k] native_write_msr_safe
                           [kernel.vmlinux]  [k] ntp_tick_length
           0.00%           [kernel.vmlinux]  [k] rb_erase
           0.00%           [kernel.vmlinux]  [k] tick_sched_timer
           0.00%           [kernel.vmlinux]  [k] unmap_single_vma
           0.00%           [kernel.vmlinux]  [k] update_wall_time
           0.00%           example           [.] f1
          46.24%   -0.08%  example           [.] f2
          53.71%   +0.11%  example           [.] f3
           0.02%           example           [.] main
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/1423460384-11645-1-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      94ba462d
    • M
      perf buildid-cache: Add new buildid cache if update target is not cached · a50d11a1
      Masami Hiramatsu 提交于
      Add new buildid cache if the update target file is not cached.
      
      This can happen when an old binary is replaced by new one after caching
      the old one. In this case, user sees his operation just failed.
      
      But it does not look straight, since user just pass the binary "path",
      not "build-id".
      
        ----
        # ./perf buildid-cache --add ./perf
        (update ./perf to new binary)
        # ./perf buildid-cache --update ./perf
        ./perf wasn't in the cache
        #
        ----
      
      This patch adds given new binary to cache if the new binary is
      not cached. So we'll not see the above error.
      
        ----
        # ./perf buildid-cache --add ./perf
        (update ./perf to new binary)
        # ./perf buildid-cache --update ./perf
        #
        ----
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20150226065440.23912.1494.stgit@localhost.localdomainSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a50d11a1
  6. 26 2月, 2015 1 次提交
    • J
      perf data: Add perf data to CTF conversion support · edbe9817
      Jiri Olsa 提交于
      Adding 'perf data convert' to convert perf data file into different
      format. This patch adds support for CTF format conversion.
      
      To convert perf.data into CTF run:
        $ perf data convert --to-ctf=./ctf-data/
        [ perf data convert: Converted 'perf.data' into CTF data './ctf-data/' ]
        [ perf data convert: Converted and wrote 11.268 MB (100230 samples) ]
      
      The command will create CTF metadata out of perf.data file (or one
      specified via -i option) and then convert all sample events into single
      CTF stream.
      
      Each sample_type bit is translated into separated CTF event field apart
      from following exceptions:
      
        PERF_SAMPLE_RAW          - added in next patch
        PERF_SAMPLE_READ         - TODO
        PERF_SAMPLE_CALLCHAIN    - TODO
        PERF_SAMPLE_BRANCH_STACK - TODO
        PERF_SAMPLE_REGS_USER    - TODO
        PERF_SAMPLE_STACK_USER   - TODO
      
        $ perf --debug=data-convert=2 data convert ...
      
      The converted CTF data could be analyzed by CTF tools, like babletrace
      or tracecompass [1].
      
        $ babeltrace ./ctf-data/
        [03:19:13.962125533] (+?.?????????) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
        [03:19:13.962130001] (+0.000004468) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
        [03:19:13.962131936] (+0.000001935) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 8 }
        [03:19:13.962133732] (+0.000001796) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 114 }
        [03:19:13.962135557] (+0.000001825) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 2087 }
        [03:19:13.962137627] (+0.000002070) cycles: { }, { ip = 0xFFFFFFFF81361938, tid = 20714, pid = 20714, period = 37582 }
        [03:19:13.962161091] (+0.000023464) cycles: { }, { ip = 0xFFFFFFFF8124218F, tid = 20714, pid = 20714, period = 600246 }
        [03:19:13.962517569] (+0.000356478) cycles: { }, { ip = 0xFFFFFFFF811A75DB, tid = 20714, pid = 20714, period = 1325731 }
        [03:19:13.969518008] (+0.007000439) cycles: { }, { ip = 0x34080917B2, tid = 20714, pid = 20714, period = 1144298 }
      
      The following members to the ctf-environment were decided to be added to
      distinguish and specify perf CTF data:
      
        - domain
      
          It says "kernel" because it contains a kernel trace (not to be
          confused with a user space like lttng-ust does)
      
        - tracer_name
      
          It says perf. This can be used to distinguish between lttng and perf
          CTF based trace.
      
        - version
      
          The kernel version from stream. In addition to release, this is what
          it looks like on a Debian kernel:
      
            release = "3.14-1-amd64";
            version = "3.14.0";
      
      [1] http://projects.eclipse.org/projects/tools.tracecompassSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeremie Galarneau <jgalar@efficios.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1424470628-5969-4-git-send-email-jolsa@kernel.orgSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      edbe9817
  7. 25 2月, 2015 2 次提交
  8. 23 2月, 2015 2 次提交
    • A
      perf trace: Add man page entry for --event · 77c92582
      Arnaldo Carvalho de Melo 提交于
      Forgot to do it when adding the feature.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-mx152b6x9cgknhw91vsyjlnd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      77c92582
    • A
      perf trace: Introduce --filter-pids · f078c385
      Arnaldo Carvalho de Melo 提交于
      When tracing in X we get event loops due to the tracing activity, i.e.
      updates to a gnome-terminal that generate syscalls for X.org, etc.
      
      To get a more useful view of what is happening, syscall wise, system
      wide, we need to filter those, like in:
      
       # ps ax|egrep '981|2296|1519' | grep -v egrep
         981 tty1 Ss+ 5:40 /usr/bin/Xorg :0 -background none ...
        1519 ?    Sl  2:22 /usr/bin/gnome-shell
        2296 ?    Sl  4:16 /usr/libexec/gnome-terminal-server
       #
      
       # trace -e write --filter-pids 981,2296,1519
          0.385 ( 0.021 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 136) = 136
          0.922 ( 0.014 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 140) = 140
       5006.525 ( 0.029 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 136) = 136
       5007.235 ( 0.023 ms): goa-daemon/2061 write(fd: 1</dev/null>, buf: 0x7fbeb017b000, count: 140) = 140
       5177.646 ( 0.018 ms): rtkit-daemon/782 write(fd: 5<anon_inode:[eventfd]>, buf: 0x7f7eea70be88, count: 8) = 8
       8314.497 ( 0.004 ms): gsd-locate-poi/2084 write(fd: 5<anon_inode:[eventfd]>, buf: 0x7fffe96af7b0, count: 8) = 8
       8314.518 ( 0.002 ms): gsd-locate-poi/2084 write(fd: 5<anon_inode:[eventfd]>, buf: 0x7fffe96af0e0, count: 8) = 8
       ^C#
      
      When this option is used the tracer pid is also filtered.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-f5qmiyy7c0uxdm21ncatpeek@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f078c385
  9. 19 2月, 2015 1 次提交
    • K
      perf tools: Enable LBR call stack support · aad2b21c
      Kan Liang 提交于
      Currently, there are two call chain recording options, fp and dwarf.
      
      Haswell has a new feature that utilizes the existing LBR facility to
      record call chains. Kernel side LBR support code provides this as a
      third option to record call chains. This patch enables the lbr call
      stack support on the tooling side.
      
      LBR call stack has some limitations:
      
       - It reuses current LBR facility, so LBR call stack and branch record
         can not be enabled at the same time.
      
       - It is only available for user-space callchains.
      
      However, it also offers some advantages:
      
       - LBR call stack can work on user apps which don't have frame-pointers
         or dwarf debug info compiled. It is a good alternative when nothing
         else works.
      Tested-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masanari Iida <standby24x7@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rodrigo Campos <rodrigo@sdfg.com.ar>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1420482185-29830-2-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aad2b21c
  10. 13 2月, 2015 1 次提交
  11. 06 2月, 2015 1 次提交
  12. 22 1月, 2015 3 次提交
  13. 09 12月, 2014 1 次提交
  14. 03 12月, 2014 1 次提交
  15. 02 12月, 2014 2 次提交
    • A
      perf report: Add --branch-history option · fa94c36c
      Andi Kleen 提交于
      Add a --branch-history option to perf report that changes all the
      settings necessary for using the branches in callstacks.
      
      This is just a short cut to make this nicer to use, it does not enable
      any functionality by itself.
      
      v2: Change sort order. Rename option to --branch-history to
          be less confusing.
      v3: Updates
      v4: Fix conflict with newer perf base
      v5: Port to latest tip
      v6: Add more comments. Remove CCKEY_ADDRESS setting. Remove
          unnecessary branch_mode setting. Use a boolean.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1415844328-4884-5-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fa94c36c
    • A
      perf callchain: Support handling complete branch stacks as histograms · 8b7bad58
      Andi Kleen 提交于
      Currently branch stacks can be only shown as edge histograms for
      individual branches. I never found this display particularly useful.
      
      This implements an alternative mode that creates histograms over
      complete branch traces, instead of individual branches, similar to how
      normal callgraphs are handled. This is done by putting it in front of
      the normal callgraph and then using the normal callgraph histogram
      infrastructure to unify them.
      
      This way in complex functions we can understand the control flow that
      lead to a particular sample, and may even see some control flow in the
      caller for short functions.
      
      Example (simplified, of course for such simple code this is usually not
      needed), please run this after the whole patchkit is in, as at this
      point in the patch order there is no --branch-history, that will be
      added in a patch after this one:
      
      tcall.c:
      
      volatile a = 10000, b = 100000, c;
      
      __attribute__((noinline)) f2()
      {
      	c = a / b;
      }
      
      __attribute__((noinline)) f1()
      {
      	f2();
      	f2();
      }
      main()
      {
      	int i;
      	for (i = 0; i < 1000000; i++)
      		f1();
      }
      
      % perf record -b -g ./tsrc/tcall
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
      % perf report --no-children --branch-history
      ...
          54.91%  tcall.c:6  [.] f2                      tcall
                  |
                  |--65.53%-- f2 tcall.c:5
                  |          |
                  |          |--70.83%-- f1 tcall.c:11
                  |          |          f1 tcall.c:10
                  |          |          main tcall.c:18
                  |          |          main tcall.c:18
                  |          |          main tcall.c:17
                  |          |          main tcall.c:17
                  |          |          f1 tcall.c:13
                  |          |          f1 tcall.c:13
                  |          |          f2 tcall.c:7
                  |          |          f2 tcall.c:5
                  |          |          f1 tcall.c:12
                  |          |          f1 tcall.c:12
                  |          |          f2 tcall.c:7
                  |          |          f2 tcall.c:5
                  |          |          f1 tcall.c:11
                  |          |
                  |           --29.17%-- f1 tcall.c:12
                  |                     f1 tcall.c:12
                  |                     f2 tcall.c:7
                  |                     f2 tcall.c:5
                  |                     f1 tcall.c:11
                  |                     f1 tcall.c:10
                  |                     main tcall.c:18
                  |                     main tcall.c:18
                  |                     main tcall.c:17
                  |                     main tcall.c:17
                  |                     f1 tcall.c:13
                  |                     f1 tcall.c:13
                  |                     f2 tcall.c:7
                  |                     f2 tcall.c:5
                  |                     f1 tcall.c:12
      
      The default output is unchanged.
      
      This is only implemented in perf report, no change to record or anywhere
      else.
      
      This adds the basic code to report:
      
      - add a new "branch" option to the -g option parser to enable this mode
      - when the flag is set include the LBR into the callstack in machine.c.
      
      The rest of the history code is unchanged and doesn't know the
      difference between LBR entry and normal call entry.
      
      - detect overlaps with the callchain
      - remove small loop duplicates in the LBR
      
      Current limitations:
      
      - The LBR flags (mispredict etc.) are not shown in the history
      and LBR entries have no special marker.
      - It would be nice if annotate marked the LBR entries somehow
      (e.g. with arrows)
      
      v2: Various fixes.
      v3: Merge further patches into this one. Fix white space.
      v4: Improve manpage. Address review feedback.
      v5: Rename functions. Better error message without -g. Fix crash without
          -b.
      v6: Rebase
      v7: Rebase. Use NO_ENTRY in memset.
      v8: Port to latest tip. Move add_callchain_ip to separate
          patch. Skip initial entries in callchain. Minor cleanups.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b7bad58
  16. 16 11月, 2014 1 次提交
  17. 18 10月, 2014 1 次提交
    • J
      perf script: Add period data column · 535aeaae
      Jiri Olsa 提交于
      Adding period data column to be displayed in perf script.  It's possible
      to get period values using -f option, like:
      
        $ perf script -f comm,tid,time,period,ip,sym,dso
                :26019 26019 52414.329088:       3707  ffffffff8105443a native_write_msr_safe ([kernel.kallsyms])
                :26019 26019 52414.329088:         44  ffffffff8105443a native_write_msr_safe ([kernel.kallsyms])
                :26019 26019 52414.329093:       1987  ffffffff8105443a native_write_msr_safe ([kernel.kallsyms])
                :26019 26019 52414.329093:          6  ffffffff8105443a native_write_msr_safe ([kernel.kallsyms])
                    ls 26019 52414.329442:     537558        3407c0639c _dl_map_object_from_fd (/usr/lib64/ld-2.17.so)
                    ls 26019 52414.329442:       2099        3407c0639c _dl_map_object_from_fd (/usr/lib64/ld-2.17.so)
                    ls 26019 52414.330181:    1242100        34080917bb get_next_seq (/usr/lib64/libc-2.17.so)
                    ls 26019 52414.330181:       3774        34080917bb get_next_seq (/usr/lib64/libc-2.17.so)
                    ls 26019 52414.331427:    1083662  ffffffff810c7dc2 update_curr ([kernel.kallsyms])
                    ls 26019 52414.331427:        360  ffffffff810c7dc2 update_curr ([kernel.kallsyms])
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: "Jen-Cheng(Tommy) Huang" <tommy24@gatech.edu>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jen-Cheng(Tommy) Huang <tommy24@gatech.edu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1408977943-16594-9-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      535aeaae
  18. 16 10月, 2014 1 次提交
  19. 18 9月, 2014 1 次提交
  20. 12 8月, 2014 1 次提交
  21. 25 7月, 2014 1 次提交
  22. 17 7月, 2014 2 次提交
  23. 10 7月, 2014 2 次提交
  24. 27 6月, 2014 2 次提交
    • S
      perf trace: Add possibility to switch off syscall events · e281a960
      Stanislav Fomichev 提交于
      Currently, we may either trace syscalls or syscalls+pagefaults.
      
      We'd like to be able to trace *only* pagefaults and this commit
      implements this feature.
      
      Example:
      
        [root@zoo /]# echo 1 > /proc/sys/vm/drop_caches ; trace --no-syscalls -F -p `pidof xchat`
             0.000 ( 0.000 ms): xchat/4574 majfault [g_unichar_get_script+0x11] => /usr/lib64/libglib-2.0.so.0.3800.2@0xc403b (x.)
             0.202 ( 0.000 ms): xchat/4574 majfault [_cairo_hash_table_lookup+0x53] => 0x2280ff0 (?.)
            20.854 ( 0.000 ms): xchat/4574 majfault [gdk_cairo_set_source_pixbuf+0x110] => /usr/bin/xchat@0x6da1f (x.)
          1022.000 ( 0.000 ms): xchat/4574 majfault [__memcpy_sse2_unaligned+0x29] => 0x7ff5a8ca0400 (?.)
        ^C[root@zoo /]#
      
      Below we can see malloc calls, 'trace' reading symbol tables in libraries to
      resolve symbols, etc.
      
        [root@zoo /]# echo 1 > /proc/sys/vm/drop_caches ; trace --no-syscalls -F all --cpu 1 sleep 10
             0.000 ( 0.000 ms): chrome/26589 minfault [0x1b53129] => /tmp/perf-26589.map@0x33cbcbf7f000 (x.)
            96.477 ( 0.000 ms): libvirtd/947 minfault [copy_user_enhanced_fast_string+0x5] => 0x7f7685bba000 (?k)
           113.164 ( 0.000 ms): Xorg/1063 minfault [0x786da] => 0x7fce52882a3c (?.)
          7162.801 ( 0.000 ms): chrome/3747 minfault [0x8e1a89] => 0xfcaefed0008 (?.)
      <SNIP>
          7773.138 ( 0.000 ms): chrome/3886 minfault [0x8e1a89] => 0xfcb0ce28008 (?.)
          7992.022 ( 0.000 ms): chrome/26574 minfault [0x1b5a708] => 0x3de7b5fc5000 (?.)
          8108.949 ( 0.000 ms): qemu-system-x8/4537 majfault [_int_malloc+0xee] => 0x7faffc466d60 (?.)
          8108.975 ( 0.000 ms): qemu-system-x8/4537 minfault [_int_malloc+0x102] => 0x7faffc466d60 (?.)
      <SNIP>
          8148.174 ( 0.000 ms): qemu-system-x8/4537 minfault [_int_malloc+0x102] => 0x7faffc4eb500 (?.)
          8270.855 ( 0.000 ms): chrome/26245 minfault [do_bo_emit_reloc+0xdb] => 0x45d092bc004 (?.)
          8270.869 ( 0.000 ms): chrome/26245 minfault [do_bo_emit_reloc+0x108] => 0x45d09150000 (?.)
      no symbols found in /usr/lib64/libspice-server.so.1.9.0, maybe install a debug package?
          8273.831 ( 0.000 ms): trace/20198 majfault [__memcmp_sse4_1+0xbc6] => /usr/lib64/libspice-server.so.1.9.0@0xdf000 (d.)
      <SNIP>
          8275.121 ( 0.000 ms): trace/20198 minfault [dso__load+0x38] => 0x14fe756 (?.)
      no symbols found in /usr/lib64/libelf-0.158.so, maybe install a debug package?
          8275.142 ( 0.000 ms): trace/20198 minfault [__memcmp_sse4_1+0xbc6] => /usr/lib64/libelf-0.158.so@0x0 (d.)
      <SNIP>
        [root@zoo /]#
      Signed-off-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1403799268-1367-6-git-send-email-stfomichev@yandex-team.ruSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e281a960
    • S
      perf trace: Add support for pagefault tracing · 598d02c5
      Stanislav Fomichev 提交于
      This patch adds optional pagefault tracing support to 'perf trace'.
      
      Using -F/--pf option user can specify whether he wants minor, major or
      all pagefault events to be traced. This patch adds only live mode,
      record and replace will come in a separate patch.
      
      Example output:
      
        1756272.905 ( 0.000 ms): curl/5937 majfault [0x7fa7261978b6] => /usr/lib/x86_64-linux-gnu/libkrb5.so.26.0.0@0x85288 (d.)
        1862866.036 ( 0.000 ms): wget/8460 majfault [__clear_user+0x3f] => 0x659cb4 (?k)
      Signed-off-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1403799268-1367-3-git-send-email-stfomichev@yandex-team.ruSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      598d02c5
  25. 20 6月, 2014 1 次提交
    • D
      perf bench: Add --repeat option · b6f0629a
      Davidlohr Bueso 提交于
      There are a number of benchmarks that do single runs and as a result
      does not really help users gain a general idea of how the workload
      performs. So the user must either manually do multiple runs or just use
      single bogus results.
      
      This option will enable users to specify the amount of runs (arbitrarily
      defaulted to 10, to use the existing benchmarks default) through the
      '--repeat' option.  Add it to perf-bench instead of implementing it
      always in each specific benchmark.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1402942467-10671-2-git-send-email-davidlohr@hp.com
      [ Kept the existing default of 10, changing it to something else should
        be done on separate patch ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b6f0629a
  26. 10 6月, 2014 1 次提交
  27. 09 6月, 2014 2 次提交
    • D
      perf tools: Add dcacheline sort · 9b32ba71
      Don Zickus 提交于
      In perf's 'mem-mode', one can get access to a whole bunch of details specific to a
      particular sample instruction.  A bunch of those details relate to the data
      address.
      
      One interesting thing you can do with data addresses is to convert them into a unique
      cacheline they belong too.  Organizing these data cachelines into similar groups and sorting
      them can reveal cache contention.
      
      This patch creates an alogorithm based on various sample details that can help group
      entries together into data cachelines and allows 'perf report' to sort on it.
      
      The algorithm relies on having proper mmap2 support in the kernel to help determine
      if the memory map the data address belongs to is private to a pid or globally shared.
      
      The alogortithm is as follows:
      
      o group cpumodes together
      o group entries with discovered maps together
      o sort on major, minor, inode and inode generation numbers
      o if userspace anon, then sort on pid
      o sort on cachelines based on data addresses
      
      The 'dcacheline' sort option in 'perf report' only works in 'mem-mode'.
      
      Sample output:
      
       #
       # Samples: 206  of event 'cpu/mem-loads/pp'
       # Total weight : 2534
       # Sort order   : dcacheline,pid
       #
       # Overhead       Samples                                                          Data Cacheline       Command:  Pid
       # ........  ............  ......................................................................  ..................
       #
          13.22%             1  [k] 0xffff88042f08ebc0                                                       swapper:    0
           9.27%             1  [k] 0xffff88082e8cea80                                                       swapper:    0
           3.59%             2  [k] 0xffffffff819ba180                                                       swapper:    0
           0.32%             1  [k] arch_trigger_all_cpu_backtrace_handler_na.23901+0xffffffffffffffe0       swapper:    0
           0.32%             1  [k] timekeeper_seq+0xfffffffffffffff8                                        swapper:    0
      
      Note:  Added a '+1' to symlen size in hists__calc_col_len to prevent the next column
      from prematurely tabbing over and mis-aligning.  Not sure what the problem is.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/r/1401208087-181977-8-git-send-email-dzickus@redhat.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      9b32ba71
    • D
      perf report: Add mem-mode documentation to report command · 75e906c9
      Don Zickus 提交于
      Add mem-mode sorting types and mem-mode itself to perf-report documentation.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/r/1400526833-141779-5-git-send-email-dzickus@redhat.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      75e906c9