1. 11 10月, 2013 1 次提交
  2. 09 10月, 2013 5 次提交
    • J
      perf tools: Add possibility to specify mmap size · 27050f53
      Jiri Olsa 提交于
      Adding possibility to specify mmap size via -m/--mmap-pages
      by appending unit size character (B/K/M/G) to the
      number, like:
        $ perf record -m 8K ls
        $ perf record -m 2M ls
      
      The size is rounded up appropriately to follow perf
      mmap restrictions.
      
      If no unit is specified the number provides pages as
      of now, like:
        $ perf record -m 8 ls
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1378031796-17892-3-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      27050f53
    • D
      perf lock: Account for lock average wait time · f37376cd
      Davidlohr Bueso 提交于
      While perf-lock currently reports both the total wait time and the
      number of contentions, it doesn't explicitly show the average wait time.
      Having this value immediately in the report can be quite useful when
      looking into performance issues.
      
      Furthermore, allowing report to sort by averages is another handy
      feature to have - and thus do not only print the value, but add it to
      the lock_stat structure.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1378693159-8747-8-git-send-email-davidlohr@hp.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f37376cd
    • A
      perf trace: Add option to show process COMM · 50c95cbd
      Arnaldo Carvalho de Melo 提交于
      Enabled by default, disable with --no-comm, e.g.:
      
       181.821 (0.001 ms): deja-dup-monit/10784 recvmsg(fd: 8, msg: 0x7fff4342baf0, flags: PEEK|TRUNC|CMSG_CLOEXEC ) = 20
       181.824 (0.001 ms): deja-dup-monit/10784 geteuid(                                                           ) = 1000
       181.825 (0.001 ms): deja-dup-monit/10784 getegid(                                                           ) = 1000
       181.834 (0.002 ms): deja-dup-monit/10784 recvmsg(fd: 8, msg: 0x7fff4342baf0, flags: CMSG_CLOEXEC            ) = 20
       181.836 (0.001 ms): deja-dup-monit/10784 geteuid(                                                           ) = 1000
       181.838 (0.001 ms): deja-dup-monit/10784 getegid(                                                           ) = 1000
       181.705 (0.003 ms): evolution-addr/10924 recvmsg(fd: 10, msg: 0x7fff17dc6990, flags: PEEK|TRUNC|CMSG_CLOEXEC) = 1256
       181.710 (0.002 ms): evolution-addr/10924 geteuid(                                                           ) = 1000
       181.712 (0.001 ms): evolution-addr/10924 getegid(                                                           ) = 1000
       181.727 (0.003 ms): evolution-addr/10924 recvmsg(fd: 10, msg: 0x7fff17dc6990, flags: CMSG_CLOEXEC           ) = 1256
       181.731 (0.001 ms): evolution-addr/10924 geteuid(                                                           ) = 1000
       181.734 (0.001 ms): evolution-addr/10924 getegid(                                                           ) = 1000
       181.908 (0.002 ms): evolution-addr/10924 recvmsg(fd: 10, msg: 0x7fff17dc6990, flags: PEEK|TRUNC|CMSG_CLOEXEC) = 20
       181.913 (0.001 ms): evolution-addr/10924 geteuid(                                                           ) = 1000
       181.915 (0.001 ms): evolution-addr/10924 getegid(                                                           ) = 1000
       181.930 (0.003 ms): evolution-addr/10924 recvmsg(fd: 10, msg: 0x7fff17dc6990, flags: CMSG_CLOEXEC           ) = 20
       181.934 (0.001 ms): evolution-addr/10924 geteuid(                                                           ) = 1000
       181.937 (0.001 ms): evolution-addr/10924 getegid(                                                           ) = 1000
       220.718 (0.010 ms): at-spi2-regist/10715 sendmsg(fd: 3, msg: 0x7fffdb8756c0, flags: NOSIGNAL                ) = 200
       220.741 (0.000 ms): dbus-daemon/10711  ... [continued]: epoll_wait()) = 1
       220.759 (0.004 ms): dbus-daemon/10711 recvmsg(fd: 11, msg: 0x7ffff94594d0, flags: CMSG_CLOEXEC              ) = 200
       220.780 (0.002 ms): dbus-daemon/10711 recvmsg(fd: 11, msg: 0x7ffff94594d0, flags: CMSG_CLOEXEC              ) = 200
       220.788 (0.001 ms): dbus-daemon/10711 recvmsg(fd: 11, msg: 0x7ffff94594d0, flags: CMSG_CLOEXEC              ) = -1 EAGAIN Resource temporarily unavailable
       220.760 (0.004 ms): at-spi2-regist/10715 sendmsg(fd: 3, msg: 0x7fffdb8756c0, flags: NOSIGNAL                ) = 200
       220.771 (0.023 ms): perf/26347 open(filename: 0xf2e780, mode: 15918976                               ) = 19
       220.850 (0.002 ms): perf/26347 close(fd: 19                                                          ) = 0
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-6be5jvnkdzjptdrebfn5263n@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      50c95cbd
    • D
      perf trace: Add option to show full timestamp · 4bb09192
      David Ahern 提交于
      Current timestamp shown for output is time relative to firt sample. This
      patch adds an option to show the absolute perf_clock timestamp which is
      useful when comparing output across commands (e.g., perf-trace to
      perf-script).
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1378319865-55695-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4bb09192
    • I
      tools/perf: Fix double/triple-build of the feature detection logic during 'make install' et al · 31f6be65
      Ingo Molnar 提交于
      Linus reported the following perf build system bug:
      
        'Another annoyance during that make was that "make install" seems to
         want to re-make the thing I just built. That's absolutely horrible, [...]'
      
      The thing that got re-built were 'only' the (numerous) feature checks,
      not the whole project - but still it was mighty annoying as the feature
      checks took 9+ seconds even on reasonably fast boxes.
      
      Even with the autodep patches where feature detection is much faster
      it wastes resources, wastes screen real estate and confuses users if
      we execute feature detection twice.
      
      There were two sources for these unnecessary re-builds of the feature
      checks:
      
       - Unnecessary nested invocations of $(MAKE), apparently to be able
         to do conditional compilation dependent on documentation tools
         presence. Use straight dependencies instead, with no nesting.
      
       - A direct invocation of $(MAKE) to rebuild the PERF-VERSION-FILE.
         This is apparently done to be able to include it into the
         Makefile:
      
          -include $(OUTPUT)PERF-VERSION-FILE
      
         but that's entirely pointless for two reasons: 1) the version file
         gets regenerated by the initial build pass anyway, 2) including it
         is futile, given its contents:
      
          #define PERF_VERSION "3.12.rc3.g8510c7"
      
         'make' will interpret that as a comment line...
      
         So just remove this part of the doc-generation logic.
      
      With these things fixed a 'make install' now rebuilds only what is needed.
      
      A repeated 'make install' on an already built tree is super fast now,
      it finishes in under 0.3 seconds:
      
        #
        #  After the patch:
        #
      
        $ time make install
      
        ...
      
        real    0m0.280s
        user    0m0.162s
        sys     0m0.054s
      
      Prior all the autodep changes and prior this fix, a repeat 'make install'
      took 24.1 seconds (!) on the same system:
      
        #
        #  Before the patches:
        #
      
        $ time make install
      
        ...
      
        real    0m24.109s
        user    0m21.171s
        sys     0m2.449s
      
      Which almost entirely was caused by fixable build system fat.
      We are now literally ~86 times faster.
      
      A fresh rebuild and install now takes just 11.4 seconds:
      
        #
        #  After the patch:
        #
      
        $ make clean
        $ time make -j16 install
      
        ...
      
        real    0m11.457s
        user    1m43.411s
        sys     0m7.610s
      
      Without the patches it took 27.8 seconds:
      
        #
        #  Before the patches:
        #
      
        $ make clean
        $ time make -j16 install
      
        ...
      
        real    0m27.801s
        user    1m59.242s
        sys     0m9.749s
      
      So even in the complete rebuild case we are now ~2.5 times faster.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/n/tip-x4qjnxjGrgxpribq8sdakfTp@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      31f6be65
  3. 04 10月, 2013 4 次提交
  4. 30 8月, 2013 1 次提交
  5. 27 8月, 2013 4 次提交
  6. 14 8月, 2013 1 次提交
  7. 12 8月, 2013 1 次提交
  8. 08 8月, 2013 3 次提交
    • M
      perf tools: Add support for pinned modifier · e9a7c414
      Michael Ellerman 提交于
      This commit adds support for a new modifier "D", which requests that the
      event, or group of events, be pinned to the PMU.
      
      The "p" modifier is already taken for precise, and "P" may be used in
      future to mean "fully precise".
      
      So we use "D", which stands for pinneD - and looks like a padlock, or if
      you're using the ":D" syntax perf smiles at you.
      
      This is an oft-requested feature from our HW folks, who want to be able
      to run a large number of events, but also want 100% accurate results for
      instructions per cycle.
      
      Comparison of results with and without pinning:
      
      $ perf stat -e '{cycles,instructions}:D' -e cycles,instructions,...
      
        79,590,480,683 cycles         #  0.000 GHz
       166,123,716,524 instructions   #  2.09  insns per cycle
                                      #  0.11  stalled cycles per insn
      
        79,352,134,463 cycles         #  0.000 GHz                     [11.11%]
       165,178,301,818 instructions   #  2.08  insns per cycle
                                      #  0.11  stalled cycles per insn [11.13%]
      
      As you can see although perf does a very good job of scaling the values
      in the non-pinned case, there is some small discrepancy.
      
      The patch is fairly straight forward, the one detail is that we need to
      make sure we only request pinning for the group leader when we have a
      group.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1375795686-4226-1-git-send-email-michael@ellerman.id.au
      [ Use perf_evsel__is_group_leader instead of open coded equivalent, as
        suggested by Jiri Olsa ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9a7c414
    • A
      perf stat: Add support for --initial-delay option · 41191688
      Andi Kleen 提交于
      When measuring workloads the startup phase -- doing page faults, dynamic
      linking, opening files -- is often very different from the rest of the
      workload.  Especially with smaller kernels and using counter
      multiplexing this can give significant measurement errors.
      
      Multiplexing assumes that the workload is mostly the same over longer
      periods. But at startup there is typically some spike of activity which
      is relatively short.  If many groups are multiplexing the one group
      seeing the spike, and which is then scaled up over the time to run all
      groups, may see a significant error.
      
      Also in general it's often not useful to measure the startup, because it
      is so different from the rest.
      
      One way around this is to use interval mode and discard the first
      sample, but this can be awkward because interval mode doesn't support
      intervals of less than 100ms, and also a useful interval is not
      necessarily the same as a useful startup delay.
      
      This patch adds a new --initial-delay / -D option to skip measuring for
      the startup phase. The time can be specified in ms
      
      Here's a simple example:
      
      perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
      ...
                   3,721 page-faults
      ...
      
      If we just wait 20 ms the number of page faults is 1/3 less:
      
      perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
      ...
                   2,823 page-faults
      ...
      
      So we filtered out most of the startup noise from bash.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      41191688
    • J
      perf tools: Add 'S' event/group modifier to read sample value · 3c176311
      Jiri Olsa 提交于
      Adding 'S' event/group modifier to specify that the event value/s are
      read by PERF_SAMPLE_READ sample type processing, instead of the period
      value offered by lower layers.
      
      There's additional behaviour change for 'S' modifier being specified on
      event group:
      
      Currently all the events within a group makes samples. If user now
      specifies 'S' within group modifier, only the leader will trigger
      samples. The rest of events in the group will have sampling disabled.
      
      And same as for single events, values of all events within the group
      (including leader) are read by PERF_SAMPLE_READ sample type processing.
      
      Following example will create event group with cycles and cache-misses
      events, setting the cycles as group leader and the only event to
      actually sample. Both cycles and cache-misses event period values are
      read by PERF_SAMPLE_READ sample type processing with PERF_FORMAT_GROUP
      read format.
      
      Example:
      
        $ perf record -e '{cycles,cache-misses}:S' ls
        ...
        $ perf report --group --show-total-period --stdio
        ...
        # Samples: 36  of event 'anon group { cycles, cache-misses }'
        # Event count (approx.): 12585593
        #
        #       Overhead          Period  Command      Shared Object                      Symbol
        # ..............  ..............  .......  .................  ..........................
        #
          19.92%   1.20%  2505936     31       ls  [kernel.kallsyms]  [k] mark_held_locks
          13.74%   0.47%  1729327     12       ls  [kernel.kallsyms]  [k] sched_clock_local
          13.64%  23.72%  1716147    612       ls  ld-2.14.90.so      [.] check_match.10805
          13.12%  23.22%  1650778    599       ls  libc-2.14.90.so    [.] _nl_intern_locale_data
          11.24%  29.19%  1414554    753       ls  [kernel.kallsyms]  [k] sched_clock_cpu
           8.50%   0.35%  1070150      9       ls  [kernel.kallsyms]  [k] check_chain_key
        ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-iyoinu3axi11mymwnh2b7fxj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3c176311
  9. 22 7月, 2013 1 次提交
  10. 13 7月, 2013 4 次提交
  11. 09 7月, 2013 4 次提交
  12. 28 5月, 2013 3 次提交
  13. 01 4月, 2013 2 次提交
  14. 27 3月, 2013 1 次提交
  15. 26 3月, 2013 2 次提交
  16. 16 3月, 2013 2 次提交
    • F
      perf stat: Introduce --repeat forever · a7e191c3
      Frederik Deweerdt 提交于
      The following patch causes 'perf stat --repeat 0' to be interpreted as
      'forever', displaying the stats for every run.
      
      We act as if a single run was asked, and reset the stats in each
      iteration. In this mode SIGINT is passed to perf to be able to stop the
      loop with Ctrl+C.
      Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20130301180227.GA24385@ks398093.ip-192-95-24.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a7e191c3
    • N
      perf annotate: Add basic support to event group view · b1dd4432
      Namhyung Kim 提交于
      Add --group option to enable event grouping.  When enabled, all the
      group members information will be shown with the leader so skip
      non-leader events.
      
      It only supports --stdio output currently.  Later patches will extend
      additional features.
      
       $ perf annotate --group --stdio
       ...
        Percent                 |      Source code & Disassembly of libpthread-2.15.so
       --------------------------------------------------------------------------------
                                :
                                :
                                :
                                :      Disassembly of section .text:
                                :
                                :      000000387dc0aa50 <__pthread_mutex_unlock_usercnt>:
           8.08    2.40    5.29 :        387dc0aa50:   mov    %rdi,%rdx
           0.00    0.00    0.00 :        387dc0aa53:   mov    0x10(%rdi),%edi
           0.00    0.00    0.00 :        387dc0aa56:   mov    %edi,%eax
           0.00    0.80    0.00 :        387dc0aa58:   and    $0x7f,%eax
           3.03    2.40    3.53 :        387dc0aa5b:   test   $0x7c,%dil
           0.00    0.00    0.00 :        387dc0aa5f:   jne    387dc0aaa9 <__pthread_mutex_unlock_use
           0.00    0.00    0.00 :        387dc0aa61:   test   %eax,%eax
           0.00    0.00    0.00 :        387dc0aa63:   jne    387dc0aa85 <__pthread_mutex_unlock_use
           0.00    0.00    0.00 :        387dc0aa65:   and    $0x80,%edi
           0.00    0.00    0.00 :        387dc0aa6b:   test   %esi,%esi
           3.03    5.60    7.06 :        387dc0aa6d:   movl   $0x0,0x8(%rdx)
           0.00    0.00    0.59 :        387dc0aa74:   je     387dc0aa7a <__pthread_mutex_unlock_use
           0.00    0.00    0.00 :        387dc0aa76:   subl   $0x1,0xc(%rdx)
           2.02    5.60    1.18 :        387dc0aa7a:   mov    %edi,%esi
           0.00    0.00    0.00 :        387dc0aa7c:   lock decl (%rdx)
          83.84   83.20   82.35 :        387dc0aa7f:   jne    387dc0aada <_L_unlock_586>
           0.00    0.00    0.00 :        387dc0aa81:   nop
           0.00    0.00    0.00 :        387dc0aa82:   xor    %eax,%eax
           0.00    0.00    0.00 :        387dc0aa84:   retq
       ...
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1362462812-30885-6-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b1dd4432
  17. 15 2月, 2013 1 次提交