1. 07 1月, 2016 4 次提交
    • N
      perf tools: Add 'trace' sort key · a34bb6a0
      Namhyung Kim 提交于
      The 'trace' sort key is to show tracepoint event output using either
      print fmt or plugin.  For example sched_switch event (using plugin) will
      show output like below:
      
        # perf record -e sched:sched_switch -a usleep 10
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.197 MB perf.data (69 samples) ]
        #
      
        $ perf report -s trace --stdio
        ...
        # Overhead  Trace output
        # ........  ...................................................
        #
             9.48%  swapper/0:0 [120] R ==> transmission-gt:17773 [120]
             9.48%  transmission-gt:17773 [120] S ==> swapper/0:0 [120]
             9.04%  swapper/2:0 [120] R ==> transmission-gt:17773 [120]
             8.92%  transmission-gt:17773 [120] S ==> swapper/2:0 [120]
             5.25%  swapper/0:0 [120] R ==> kworker/0:1H:109 [100]
             5.21%  kworker/0:1H:109 [100] S ==> swapper/0:0 [120]
             1.78%  swapper/3:0 [120] R ==> transmission-gt:17773 [120]
             1.78%  transmission-gt:17773 [120] S ==> swapper/3:0 [120]
             1.53%  Xephyr:6524 [120] S ==> swapper/0:0 [120]
             1.53%  swapper/0:0 [120] R ==> Xephyr:6524 [120]
             1.17%  swapper/2:0 [120] R ==> irq/33-iwlwifi:233 [49]
             1.13%  irq/33-iwlwifi:233 [49] S ==> swapper/2:0 [120]
      
      Note that the 'trace' sort key works only for tracepoint events.  If
      it's used to other type of events, just "N/A" will be printed.
      Suggested-and-acked-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1450804030-29193-8-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a34bb6a0
    • N
      perf tools: Try to show pretty printed output for dynamic sort keys · 60517d28
      Namhyung Kim 提交于
      Each tracepoint event has format string for print to improve
      readability.  Try to parse the output and match the field name.  If it
      finds one, use that for the result.  If not, fallbacks to the original
      output.
      
      For example, sort on kmem:kmalloc.gfp_flags looks like below:
      (Note: libtraceevent plugins are not installed on my system.  They might
      affect the output below)
      
      Before:
        # Overhead  Command   gfp_flags
        # ........  .......  ..........
        #
            99.89%  perf          32848
             0.06%  sleep           208
             0.03%  perf          32976
             0.01%  perf            208
      
      After:
        # Overhead  Command            gfp_flags
        # ........  .......  ...................
        #
            99.89%  perf       GFP_NOFS|GFP_ZERO
             0.06%  sleep             GFP_KERNEL
             0.03%  perf     GFP_KERNEL|GFP_ZERO
             0.01%  perf              GFP_KERNEL
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1450804030-29193-7-git-send-email-namhyung@kernel.org
      [ Fixed clash with earlier, updated patch in this patchkit ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      60517d28
    • N
      perf tools: Add dynamic sort key for tracepoint events · c7c2a5e4
      Namhyung Kim 提交于
      The existing sort keys are less useful for tracepoint events in that
      they are always sampled at the same place, the function where the
      tracepoint is located.
      
      For example, a 'perf report' on sched:sched_switch event looks like the
      following:
      
        # Overhead  Command          Shared Object     Symbol
        # ........  ...............  ................  ..............
        #
            47.22%  swapper          [kernel.vmlinux]  [k] __schedule
            21.67%  transmission-gt  [kernel.vmlinux]  [k] __schedule
             8.23%  netctl-auto      [kernel.vmlinux]  [k] __schedule
             5.53%  kworker/0:1H     [kernel.vmlinux]  [k] __schedule
             1.98%  Xephyr           [kernel.vmlinux]  [k] __schedule
             1.33%  irq/33-iwlwifi   [kernel.vmlinux]  [k] __schedule
             1.17%  wpa_cli          [kernel.vmlinux]  [k] __schedule
             1.13%  rcu_preempt      [kernel.vmlinux]  [k] __schedule
             0.85%  ksoftirqd/0      [kernel.vmlinux]  [k] __schedule
             0.77%  Timer            [kernel.vmlinux]  [k] __schedule
      
      In fact, tracepoints have meaningful information in their fields but
      there's no way to use in 'perf report' currently.  The dynamic sort keys
      are introduced in this patc to overcome this limitation.
      
      The sched:sched_switch events have following fields:
      
        # sudo cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
        name: sched_switch
        ID: 268
        format:
      	field:unsigned short common_type;         offset:0; size:2; signed:0;
      	field:unsigned char common_flags;         offset:2; size:1; signed:0;
      	field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
      	field:int common_pid;                     offset:4; size:4; signed:1;
      
      	field:char prev_comm[16]; offset:8;  size:16; signed:1;
      	field:pid_t prev_pid;     offset:24; size:4;  signed:1;
      	field:int prev_prio;      offset:28; size:4;  signed:1;
      	field:long prev_state;    offset:32; size:8;  signed:1;
      	field:char next_comm[16]; offset:40; size:16; signed:1;
      	field:pid_t next_pid;     offset:56; size:4;  signed:1;
      	field:int next_prio;      offset:60; size:4;  signed:1;
      
        print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==>
                    next_comm=%s next_pid=%d next_prio=%d",
          REC->prev_comm, REC->prev_pid, REC->prev_prio,
          REC->prev_state & (2048-1) ? __print_flags(REC->prev_state & (2048-1),
          "|", { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, { 16, "Z" }, { 32, "X" },
          { 64, "x" }, { 128, "K"}, { 256, "W" }, { 512, "P" }, { 1024, "N" }) : "R",
          REC->prev_state & 2048 ? "+" : "", REC->next_comm, REC->next_pid, REC->next_prio
      
      With dynamic sort keys, you can use <event.field> as a sort key.  Those
      dynamic keys are checked and created on demand.  For instance, below is
      to sort by next_pid field output on the same data file:
      
        $ perf report -s comm,sched:sched_switch.next_pid --stdio
        ...
        # Overhead  Command            next_pid
        # ........  ...............  ..........
        #
            21.23%  transmission-gt           0
            20.86%  swapper               17773
             6.62%  netctl-auto               0
             5.25%  swapper                 109
             5.21%  kworker/0:1H              0
             1.98%  Xephyr                    0
             1.98%  swapper                6524
             1.98%  swapper               27478
             1.37%  swapper               27476
             1.17%  swapper                 233
      
      Multiple dynamic sort keys are also supported:
      
        $ perf report -s comm,sched:sched_switch.next_pid,sched:sched_switch.next_comm --stdio
        ...
        # Overhead  Command            next_pid         next_comm
        # ........  ...............  ..........  ................
        #
            20.86%  swapper               17773   transmission-gt
             9.64%  transmission-gt           0         swapper/0
             9.16%  transmission-gt           0         swapper/2
             5.25%  swapper                 109      kworker/0:1H
             5.21%  kworker/0:1H              0         swapper/0
             2.14%  netctl-auto               0         swapper/2
             1.98%  netctl-auto               0         swapper/0
             1.98%  swapper                6524            Xephyr
             1.98%  swapper               27478       netctl-auto
             1.78%  transmission-gt           0         swapper/3
             1.53%  Xephyr                    0         swapper/0
             1.29%  netctl-auto               0         swapper/1
             1.29%  swapper               27476       netctl-auto
             1.21%  netctl-auto               0         swapper/3
             1.17%  swapper                 233    irq/33-iwlwifi
      
      Note that pid 0 exists for each cpu so have comm of 'swapper/N'.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1450804030-29193-6-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c7c2a5e4
    • N
      perf tools: Pass evlist to setup_sorting() · 40184c46
      Namhyung Kim 提交于
      This is a preparation to support dynamic sort keys for tracepoint
      events.  Dynamic sort keys can be created for specific fields in trace
      events so it needs the event information.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1450804030-29193-5-git-send-email-namhyung@kernel.org
      [ Moving the evlist creation earlier in top was split to a previous patch ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      40184c46
  2. 07 10月, 2015 2 次提交
  3. 06 10月, 2015 1 次提交
  4. 14 9月, 2015 1 次提交
    • K
      perf tools: Introduce new sort type "socket" for the processor socket · 2e7ea3ab
      Kan Liang 提交于
      This patch enable perf report to sort by processor socket:
      
        $ perf report --stdio --sort socket,comm,dso,symbol
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 686  of event 'cycles'
        # Event count (approx.): 349215462
        #
        # Overhead SOCKET Command Shared Object    Symbol
        # ........ ...... ....... ................ ............................
        #
          97.05%    000   test    test             [.] plusB_c
           0.98%    000   test    test             [.] plusA_c
           0.93%    001   perf    [kernel.vmlinux] [k] smp_call_function_single
           0.19%    001   perf    [kernel.vmlinux] [k] page_fault
           0.19%    001   swapper [kernel.vmlinux] [k] pm_qos_request
           0.16%    000   test    [kernel.vmlinux] [k] add_mm_counter_fast
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1441377946-44429-2-git-send-email-kan.liang@intel.com
      [ Fix col calc, un-allcapsify col header & read the topology when not using perf.data ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e7ea3ab
  5. 03 9月, 2015 1 次提交
    • A
      perf tools: Always use non inlined file name for 'srcfile' sort key · 2f84b42b
      Andi Kleen 提交于
      When profiling the kernel with the 'srcfile' sort key it's common to
      "get stuck" in include. For example a lot of code uses current or other
      inlines, so they get accounted to some random include file. This is not
      very useful as a high level categorization.
      
      For example just profiling the idle loop usually shows mostly inlines,
      so you never see the actual cpuidle file.
      
      This patch changes the 'srcfile' sort key to always unwind the inline
      stack using BFD/DWARF. So we always account to the base function that
      called the inline.
      
      In a few cases include is still shown (for example for MSR accesses),
      but that is because they get inlining expanded as part of assigning to a
      global function pointer. For the majority it works fine though.
      
      v2: Use simpler while loop. Add maximum iteration count.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1441133239-31254-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2f84b42b
  6. 12 8月, 2015 1 次提交
  7. 11 8月, 2015 1 次提交
    • A
      perf report: Add support for srcfile sort key · 31191a85
      Andi Kleen 提交于
      In some cases it's useful to characterize samples by file. This is
      useful to get a higher level categorization, for example to map cost to
      subsystems.
      
      Add a srcfile sort key to perf report. It builds on top of the existing
      srcline support.
      
      Commiter notes:
      
      E.g.:
      
        # perf record -F 10000 usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.016 MB perf.data (13 samples) ]
        [root@zoo ~]# perf report -s srcfile --stdio
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 869878
        #
        # Overhead  Source File
        # ........  ...........
            60.99%  .
            20.62%  paravirt.h
            14.23%  rmap.c
             4.04%  signal.c
             0.11%  msr.h
      
        #
      
      The first line is collecting all the files for which srcfiles couldn't somehow
      get resolved to:
      
        # perf report -s srcfile,dso --stdio
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles'
        # Event count (approx.): 869878
        #
        # Overhead  Source File  Shared Object
        # ........  ...........  ................
            40.97%  .            ld-2.20.so
            20.62%  paravirt.h   [kernel.vmlinux]
            20.02%  .            libc-2.20.so
            14.23%  rmap.c       [kernel.vmlinux]
             4.04%  signal.c     [kernel.vmlinux]
             0.11%  msr.h        [kernel.vmlinux]
      
        #
      
      XXX: Investigate why that is not resolving on Fedora 21, Andi says he hasn't
           seen this on Fedora 22.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1438988064-21834-1-git-send-email-andi@firstfloor.org
      [ Added column length update, from 0e65bdb3f90f ('perf hists: Update the column width for the "srcline" sort key') ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      31191a85
  8. 07 8月, 2015 2 次提交
  9. 20 6月, 2015 1 次提交
  10. 16 5月, 2015 1 次提交
  11. 27 2月, 2015 1 次提交
    • K
      perf diff: Support for different binaries · 94ba462d
      Kan Liang 提交于
      Currently, the perf diff only works with same binaries. That's because
      it compares the symbol start address. It doesn't work if the perf.data
      comes from different binaries. This patch matches the symbol names.
      
      Actually, perf diff once intended to compare the symbol names.  The
      commit as below can look for a pair by name.
      
      604c5c92 (perf diff: Change the default sort order to "dso,symbol")
      However, at that time, perf diff used a global list of dsos. That means
      the binaries which has same name can only be loaded once. That's a
      problem for comparing different binaries.
      
      For example, we have an old binary and an updated binary. They very
      likely have same name and most of the functions, so only dsos from old
      binary will be loaded. When processing the data from updated binary,
      perf still use the symbol information from old binary. That's wrong.
      
      Then the commit as below used IP to replace symbol name.
      9c443dfd ("perf diff: Fix support for all --sort combinations")
      >From that time, perf diff starts to compare the symbol address.
      
      The global dsos is discarded from a patch in 2010.
      a1645ce1 ("perf: 'perf kvm' tool for monitoring guest performance
      from host")
      However, at that time, perf diff already compared by address. So perf
      diff cannot work for different binaries as well.
      
      This patch actually rolls back the perf diff to original design. The
      document is also changed, so everybody knows the original design is to
      compare the symbol names.
      
      Here are some examples:
      
      The only difference between example_v1.c and example_v2.c is the
      location of f2 and f3. There is no change in behavior, but the previous
      perf diff display the wrong differential profile.
      
      example_v1.c
      noinline void f3(void)
      {
              volatile int i;
              for (i = 0; i < 10000;) {
      
                      if(i%2)
                              i++;
                      else
                              i++;
              }
      }
      
      noinline void f2(void)
      {
              volatile int a = 100, b, c;
              for (b = 0; b < 10000; b++)
                      c = a * b;
      
      }
      
      noinline void f1(void)
      {
                      f2();
                      f3();
      }
      
      int main()
      {
              int i;
              for (i = 0; i < 100000; i++)
                      f1();
      }
      
      example_v2.c
      noinline void f2(void)
      {
              volatile int a = 100, b, c;
              for (b = 0; b < 10000; b++)
                      c = a * b;
      }
      
      noinline void f3(void)
      {
              volatile int i;
              for (i = 0; i < 10000;) {
                      if(i%2)
                              i++;
                      else
                              i++;
              }
      }
      
      noinline void f1(void)
      {
                      f2();
                      f3();
      }
      
      int main()
      {
              int i;
              for (i = 0; i < 100000; i++)
                      f1();
      }
      
      [lk@localhost perf_diff]$ gcc example_v1.c -o example
      [lk@localhost perf_diff]$ perf record -o example_v1.data ./example
      [ perf record: Woken up 4 times to write data ]
      [ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]
      
      [lk@localhost perf_diff]$ gcc example_v2.c -o example
      [lk@localhost perf_diff]$ perf record -o example_v2.data ./example
      [ perf record: Woken up 4 times to write data ]
      [ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]
      
      Old perf diff result:
      
      [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
       Event 'cycles'
       Baseline    Delta  Shared Object     Symbol
       ........  .......  ................  ...............................
      
                           [kernel.vmlinux]  [k] __perf_event_task_sched_out
           0.00%           [kernel.vmlinux]  [k] apic_timer_interrupt
                           [kernel.vmlinux]  [k] idle_cpu
                           [kernel.vmlinux]  [k] intel_pstate_timer_func
                           [kernel.vmlinux]  [k] native_read_msr_safe
           0.00%           [kernel.vmlinux]  [k] native_read_tsc
           0.00%           [kernel.vmlinux]  [k] native_write_msr_safe
                           [kernel.vmlinux]  [k] ntp_tick_length
           0.00%           [kernel.vmlinux]  [k] rb_erase
           0.00%           [kernel.vmlinux]  [k] tick_sched_timer
           0.00%           [kernel.vmlinux]  [k] unmap_single_vma
           0.00%           [kernel.vmlinux]  [k] update_wall_time
           0.00%           example           [.] f1
          46.24%           example           [.] f2
          53.71%   -7.55%  example           [.] f3
                  +53.81%  example           [.] f3
           0.02%           example           [.] main
      
      New perf diff result:
      
      [lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
                           [kernel.vmlinux]  [k] __perf_event_task_sched_out
           0.00%           [kernel.vmlinux]  [k] apic_timer_interrupt
                           [kernel.vmlinux]  [k] idle_cpu
                           [kernel.vmlinux]  [k] intel_pstate_timer_func
                           [kernel.vmlinux]  [k] native_read_msr_safe
           0.00%           [kernel.vmlinux]  [k] native_read_tsc
           0.00%           [kernel.vmlinux]  [k] native_write_msr_safe
                           [kernel.vmlinux]  [k] ntp_tick_length
           0.00%           [kernel.vmlinux]  [k] rb_erase
           0.00%           [kernel.vmlinux]  [k] tick_sched_timer
           0.00%           [kernel.vmlinux]  [k] unmap_single_vma
           0.00%           [kernel.vmlinux]  [k] update_wall_time
           0.00%           example           [.] f1
          46.24%   -0.08%  example           [.] f2
          53.71%   +0.11%  example           [.] f3
           0.02%           example           [.] main
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/1423460384-11645-1-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      94ba462d
  12. 22 1月, 2015 1 次提交
  13. 25 11月, 2014 1 次提交
  14. 19 11月, 2014 1 次提交
    • A
      perf hists: Fix up srcline histogram key formatting · b2d53671
      Arnaldo Carvalho de Melo 提交于
      Problem introduced in:
      
        commit 5b591669 "perf report: Honor column width setting"
      
      Where the left justification signal was after the width, which ended up,
      when the width was, say, 11, always printing:
      
      	%11.11-s
      
      Instead of src:line left justified and limited to 11 chars.
      
      Resulting in a like:
      
          70.93%  %11.11-s  [.] f2                     tcall
      
      When it should instead be:
      
          70.93%  tcall.c:5    [.] f2                     tcall
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-2xnt0vqkoox52etq2qhyetr0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b2d53671
  15. 29 10月, 2014 7 次提交
  16. 10 10月, 2014 1 次提交
    • A
      perf evsel: Add hists helper · 4ea062ed
      Arnaldo Carvalho de Melo 提交于
      Not all tools need a hists instance per perf_evsel, so lets pave the way
      to remove evsel->hists while leaving a way to access the hists from a
      specially allocated evsel, one that comes with space at the end where
      lives the evsel.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jean Pihet <jean.pihet@linaro.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-qlktkhe31w4mgtbd84035sr2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4ea062ed
  17. 18 9月, 2014 1 次提交
    • J
      perf tools: Add +field argument support for --sort option · 1a1c0ffb
      Jiri Olsa 提交于
      Adding support to add field(s) to default sort order via using the '+'
      prefix, like for report:
      
        $ perf report
        Samples: 2K of event 'cycles', Event count (approx.): 882172583
        Overhead  Command  Shared Object        Symbol
           7.39%  swapper  [kernel.kallsyms]    [k] intel_idle
           1.97%  firefox  libpthread-2.17.so   [.] pthread_mutex_lock
           1.39%  firefox  [snd_hda_intel]      [k] azx_get_position
           1.11%  firefox  libpthread-2.17.so   [.] pthread_mutex_unlock
      
        $ perf report -s +cpu
        Samples: 2K of event 'cycles', Event count (approx.): 882172583
        Overhead  Command  Shared Object        Symbol                  CPU
           2.89%  swapper  [kernel.kallsyms]    [k] intel_idle          000
           2.61%  swapper  [kernel.kallsyms]    [k] intel_idle          002
           1.20%  swapper  [kernel.kallsyms]    [k] intel_idle          001
           0.82%  firefox  libpthread-2.17.so   [.] pthread_mutex_lock  002
      
      Works in general for commands using --sort option.
      
      v2 with changes suggested:
        - Use dynamic memory instead static buffer
        - Fix error message typo
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jean Pihet <jean.pihet@linaro.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20140823125948.GA1193@krava.brq.redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1a1c0ffb
  18. 24 8月, 2014 1 次提交
    • J
      perf tools: Add +field argument support for --field option · 2f3f9bcf
      Jiri Olsa 提交于
      Adding support to add field(s) to default field order via using the '+'
      prefix, like for report:
      
        $ perf report
        Samples: 10  of event 'cycles', Event count (approx.): 4463799
        Overhead  Command  Shared Object      Symbol
          32.40%  ls       [kernel.kallsyms]  [k] filemap_fault
          28.19%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
          23.38%  ls       [kernel.kallsyms]  [k] enqueue_entity
          15.04%  ls       [kernel.kallsyms]  [k] mmap_region
      
        $ perf report -F +period,sample
        Samples: 10  of event 'cycles', Event count (approx.): 4463799
        Overhead        Period       Samples  Command  Shared Object      Symbol
          32.40%       1446493             1  ls       [kernel.kallsyms]  [k] filemap_fault
          28.19%       1258486             1  ls       [kernel.kallsyms]  [k] get_page_from_freelist
          23.38%       1043754             1  ls       [kernel.kallsyms]  [k] enqueue_entity
          15.04%        671160             1  ls       [kernel.kallsyms]  [k] mmap_region
      
      Works in general for commands using --field option.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jean Pihet <jean.pihet@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1408715919-25990-2-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2f3f9bcf
  19. 12 8月, 2014 4 次提交
  20. 08 7月, 2014 1 次提交
  21. 09 6月, 2014 1 次提交
    • D
      perf tools: Add dcacheline sort · 9b32ba71
      Don Zickus 提交于
      In perf's 'mem-mode', one can get access to a whole bunch of details specific to a
      particular sample instruction.  A bunch of those details relate to the data
      address.
      
      One interesting thing you can do with data addresses is to convert them into a unique
      cacheline they belong too.  Organizing these data cachelines into similar groups and sorting
      them can reveal cache contention.
      
      This patch creates an alogorithm based on various sample details that can help group
      entries together into data cachelines and allows 'perf report' to sort on it.
      
      The algorithm relies on having proper mmap2 support in the kernel to help determine
      if the memory map the data address belongs to is private to a pid or globally shared.
      
      The alogortithm is as follows:
      
      o group cpumodes together
      o group entries with discovered maps together
      o sort on major, minor, inode and inode generation numbers
      o if userspace anon, then sort on pid
      o sort on cachelines based on data addresses
      
      The 'dcacheline' sort option in 'perf report' only works in 'mem-mode'.
      
      Sample output:
      
       #
       # Samples: 206  of event 'cpu/mem-loads/pp'
       # Total weight : 2534
       # Sort order   : dcacheline,pid
       #
       # Overhead       Samples                                                          Data Cacheline       Command:  Pid
       # ........  ............  ......................................................................  ..................
       #
          13.22%             1  [k] 0xffff88042f08ebc0                                                       swapper:    0
           9.27%             1  [k] 0xffff88082e8cea80                                                       swapper:    0
           3.59%             2  [k] 0xffffffff819ba180                                                       swapper:    0
           0.32%             1  [k] arch_trigger_all_cpu_backtrace_handler_na.23901+0xffffffffffffffe0       swapper:    0
           0.32%             1  [k] timekeeper_seq+0xfffffffffffffff8                                        swapper:    0
      
      Note:  Added a '+1' to symlen size in hists__calc_col_len to prevent the next column
      from prematurely tabbing over and mis-aligning.  Not sure what the problem is.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/r/1401208087-181977-8-git-send-email-dzickus@redhat.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      9b32ba71
  22. 04 6月, 2014 2 次提交
    • J
      perf tools: Move elide bool into perf_hpp_fmt struct · f2998422
      Jiri Olsa 提交于
      After output/sort fields refactoring, it's expensive
      to check the elide bool in its current location inside
      the 'struct sort_entry'.
      
      The perf_hpp__should_skip function gets highly noticable in
      workloads with high number of output/sort fields, like for:
      
        $ perf report -i perf-test.data -F overhead,sample,period,comm,pid,dso,symbol,cpu --stdio
      
      Performance report:
         9.70%  perf  [.] perf_hpp__should_skip
      
      Moving the elide bool into the 'struct perf_hpp_fmt', which
      makes the perf_hpp__should_skip just single struct read.
      
      Got speedup of around 22% for my test perf.data workload.
      The change should not harm any other workload types.
      
      Performance counter stats for (10 runs):
        before:
         358,319,732,626      cycles                    ( +-  0.55% )
         467,129,581,515      instructions              #    1.30  insns per cycle          ( +-  0.00% )
      
           150.943975206 seconds time elapsed           ( +-  0.62% )
      
        now:
         278,785,972,990      cycles                    ( +-  0.12% )
         370,146,797,640      instructions              #    1.33  insns per cycle          ( +-  0.00% )
      
           116.416670507 seconds time elapsed           ( +-  0.31% )
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20140601142622.GA9131@krava.brq.redhat.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      f2998422
    • J
      perf tools: Remove elide setup for SORT_MODE__MEMORY mode · 2ec85c62
      Jiri Olsa 提交于
      There's no need to setup elide of sort_dso sort entry again
      with symbol_conf.dso_list list.
      
      The only difference were list names of memory mode data,
      which does not make much sense to me.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1400858147-7155-2-git-send-email-jolsa@kernel.orgSigned-off-by: NJiri Olsa <jolsa@kernel.org>
      2ec85c62
  23. 01 6月, 2014 2 次提交
  24. 21 5月, 2014 1 次提交