1. 08 4月, 2016 6 次提交
    • A
      perf tools: Build syscall table .c header from kernel's syscall_64.tbl · 1b700c99
      Arnaldo Carvalho de Melo 提交于
      We used libaudit to map ids to syscall names and vice-versa, but that
      imposes a delay in supporting new syscalls, having to wait for libaudit
      to get those new syscalls on its tables.
      
      To remove that delay, for x86_64 initially, grab a copy of
      arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
      tables.
      
      Syscalls currently not available in audit-libs:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
        Error:	Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd
        Hint:	try 'perf list syscalls:sys_enter_*'
        Hint:	and: 'man syscalls'
        #
      
      With this patch:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
          8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
          8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
         30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos: 529592320) = 4096
         31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096
         31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos: 2314137600) = 4096
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-51xfjbxevdsucmnbc4ka5r88@git.kernel.org
      [ Added make dep for 'prepare' in 'LIBPERF_IN', fix by Wang Nan to fix parallell build ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b700c99
    • A
      perf tools: Allow generating per-arch syscall table arrays · 5af56fab
      Arnaldo Carvalho de Melo 提交于
      Tools should use a mechanism similar to arch/x86/entry/syscalls/ to
      generate a header file with the definitions for two variables:
      
        static const char *syscalltbl_x86_64[] = {
      	[0] = "read",
      	[1] = "write",
        <SNIP>
      	[324] = "membarrier",
      	[325] = "mlock2",
      	[326] = "copy_file_range",
        };
        static const int syscalltbl_x86_64_max_id = 326;
      
      In a per arch file that should then be included in
      tools/perf/util/syscalltbl.c.
      
      First one will be for x86_64.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-02uuamkxgccczdth8komspgp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af56fab
    • A
      perf trace: Move syscall table id <-> name routines to separate class · fd0db102
      Arnaldo Carvalho de Melo 提交于
      We're using libaudit for doing name to id and id to syscall name
      translations, but that makes 'perf trace' to have to wait for newer
      libaudit versions supporting recently added syscalls, such as
      "userfaultfd" at the time of this changeset.
      
      We have all the information right there, in the kernel sources, so move
      this code to a separate place, wrapped behind functions that will
      progressively use the kernel source files to extract the syscall table
      for use in 'perf trace'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i38opd09ow25mmyrvfwnbvkj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fd0db102
    • A
      perf trace: Beautify mode_t arguments · ba2f22cf
      Arnaldo Carvalho de Melo 提交于
      When reading the syscall tracepoint /format file, look for arguments of type
      "mode_t" and attach a beautifier:
      
        [root@jouet ~]# cat ~/bin/tp_with_fields_of_type
        #!/bin/bash
        grep -w $1 /sys/kernel/tracing/events/syscalls/*/format | sed -r 's%.*sys_enter_(.*)/format.*%\1%g' | paste -d, -s
        # tp_with_fields_of_type umode_t
        chmod,creat,fchmodat,fchmod,mkdirat,mkdir,mknodat,mknod,mq_open,openat,open
        #
      
      Testing it:
      
        #define S_ISUID 0004000
        #define S_ISGID 0002000
        #define S_ISVTX 0001000
        #define S_IRWXU 0000700
        #define S_IRUSR 0000400
        #define S_IWUSR 0000200
        #define S_IXUSR 0000100
      
        #define S_IRWXG 0000070
        #define S_IRGRP 0000040
        #define S_IWGRP 0000020
        #define S_IXGRP 0000010
      
        #define S_IRWXO 0000007
        #define S_IROTH 0000004
        #define S_IWOTH 0000002
        #define S_IXOTH 0000001
      
        # for mode in 4000 2000 1000 700 400 200 100 70 40 20 10 7 4 2 1 ; do \
            echo -n $mode '->' ; trace --no-inherit -e chmod,fchmodat,fchmod chmod $mode x; \
          done
        4000 -> 0.338 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID) = 0
        2000 -> 0.438 ( 0.015 ms): fchmodat(dfd: CWD, filename: x, mode: ISGID) = 0
        1000 -> 0.677 ( 0.040 ms): fchmodat(dfd: CWD, filename: x, mode: ISVTX) = 0
         700 -> 0.394 ( 0.013 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXU) = 0
         400 -> 0.337 ( 0.010 ms): fchmodat(dfd: CWD, filename: x, mode: IRUSR) = 0
         200 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IWUSR) = 0
         100 -> 0.249 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXUSR) = 0
          70 -> 0.266 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXG) = 0
          40 -> 0.329 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRGRP) = 0
          20 -> 0.250 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWGRP) = 0
          10 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXGRP) = 0
           7 -> 0.249 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXO) = 0
           4 -> 0.278 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IROTH) = 0
           2 -> 0.276 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWOTH) = 0
           1 -> 0.250 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXOTH) = 0
        #
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 7777 x
           0.258 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IALLUGO) = 0
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 7770 x
           0.258 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID|ISGID|ISVTX|IRWXU|IRWXG) = 0
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 777 x
           0.293 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXUGO
        #
      
      Now lets see if check by using the tracepoint for that specific syscall,
      instead of raw_syscalls:sys_enter as 'trace' does for its strace fu:
      
        # trace --no-inherit --ev syscalls:sys_enter_fchmodat -e fchmodat chmod 666 x
           0.255 (         ): syscalls:sys_enter_fchmodat:dfd: 0xffffffffffffff9c, filename: 0x55db32a3f0f0, mode: 0x000001b6)
           0.268 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRUGO|IWUGO                     ) = 0
        #
      
      Perfect, 0x1bc == 0666.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-18e8zfgbkj83xo87yoom43kd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba2f22cf
    • J
      perf script: Process event update events · 91daee30
      Jiri Olsa 提交于
      Andreas reported following command produces no output:
      
        # cat test.py
        #!/usr/bin/env python
      
        def stat__krava(cpu, thread, time, val, ena, run):
            print "event %s cpu %d, thread %d, time %d, val %d, ena %d, run %d" % \
                  ("krava", cpu, thread, time, val, ena, run)
        # perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
        ^C
        #
      
      The reason is that 'perf script' does not process event update events and
      will never get the event name update thus the python callback is never
      called.
      
      The fix is just to add already existing callback we use in 'perf stat
      report'.
      
      Committer note:
      
      After the patch:
      
        # perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
        event krava cpu -1, thread -1, time 1000239179, val 1789051, ena 4000690920, run 4000690920
        event krava cpu -1, thread -1, time 2000479061, val 2391338, ena 4000879596, run 4000879596
        event krava cpu -1, thread -1, time 3000740802, val 1939121, ena 4000977209, run 4000977209
        event krava cpu -1, thread -1, time 4001006730, val 2356115, ena 4001000489, run 4001000489
        ^C
        #
      Reported-by: NAndreas Hollmann <hollmann@in.tum.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-3-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91daee30
    • J
      perf tools: Add dedicated unwind addr_space member into thread struct · e583d70c
      Jiri Olsa 提交于
      Milian reported issue with thread::priv, which was double booked by perf
      trace and DWARF unwind code. So using those together is impossible at
      the moment.
      
      Moving DWARF unwind private data into separate variable so perf trace
      can keep using thread::priv.
      Reported-and-Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andreas Hollmann <hollmann@in.tum.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-2-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e583d70c
  2. 07 4月, 2016 6 次提交
    • J
      perf tools: Introduce trim function · 7d6a7e78
      Jiri Olsa 提交于
      To be used in cases for both sides trim.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andreas Hollmann <hollmann@in.tum.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-1-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7d6a7e78
    • A
      perf trace: Beautify pid_t arguments · d1d438a3
      Arnaldo Carvalho de Melo 提交于
      When reading the syscall tracepoint /format file, look for arguments
      of type "pid_t" and attach the PID beautifier, that will do a lookup
      on the threads it knows, i.e. the ones that came from PERF_RECORD_COMM
      events and add the COMM after the pid in such args:
      
      Excerpt of a system wide trace for syscalls with pid_t args:
      
        55602.977 ( 0.006 ms): bash/12122 setpgid(pid: 24347 (bash), pgid: 24347 (bash)) = 0
        55603.024 ( 0.004 ms): bash/24347 setpgid(pid: 24347 (bash), pgid: 24347 (bash)) = 0
        55691.527 (88.397 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee1720, options: UNTRACED|CONTINUED) ...
        55692.479 ( 0.952 ms): git/24347 wait4(upid: 24368, stat_addr: 0x7ffe030d5724) ...
        55694.549 ( 2.070 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4fc10) = 24369 (pre-commit)
        55694.575 ( 0.002 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f650, options: NOHANG) = -1 ECHILD No child processes
        55695.934 ( 0.010 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f2d0, options: NOHANG) = 24370 (git)
        55695.937 ( 0.001 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f2d0, options: NOHANG) = -1 ECHILD No child processes
        55717.963 ( 0.000 ms): pre-commit/24371  ... [continued]: wait4()) = 24372
        55717.978 (21.468 ms): :24371/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4f230) ...
        55718.087 ( 0.109 ms): pre-commit/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4f230) = 24373 (tr)
        55718.187 ( 0.096 ms): pre-commit/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4f230) = 24374 (wc)
        55718.218 ( 0.002 ms): pre-commit/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4eed0, options: NOHANG) = -1 ECHILD No child processes
        55718.367 ( 0.005 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f1d0, options: NOHANG) = 24371 (pre-commit)
        55718.369 ( 0.001 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f1d0, options: NOHANG) = -1 ECHILD No child processes
        55741.021 (49.494 ms): git/24347  ... [continued]: wait4()) = 24368 (pre-commit)
        74146.427 (18319.601 ms): git/24347 wait4(upid: 24375 (git), stat_addr: 0x7ffe030d6824) ...
        74149.036 ( 0.891 ms): bash/24391 wait4(upid: -1, stat_addr: 0x7ffe0cee0560) = 24393 (sed)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-75yl9hzjhb020iadc81gdj8t@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d1d438a3
    • A
      perf trace: Beautify set_tid_address, getpid, getppid return values · c65f1070
      Arnaldo Carvalho de Melo 提交于
      Showing the COMM for that return, if available.
      
        # trace -e getpid,getppid,set_tid_address
          490.007 ( 0.005 ms): sh/8250 getpid(...) = 8250 (sh)
          490.014 ( 0.001 ms): sh/8250 getppid(...) = 7886 (make)
          491.156 ( 0.004 ms): install/8251 set_tid_address(tidptr: 0x7f204a9d4ad0) = 8251 (install)
        ^C
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-psbpplqupatom9x4uohbxid5@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c65f1070
    • A
      perf trace: Infrastructure to show COMM strings for syscalls returning PIDs · 11c8e39f
      Arnaldo Carvalho de Melo 提交于
      Starting with clone, waitid and wait4:
      
        # trace -e waitid,wait4
           1.385 ( 1.385 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee1720, options: UNTRACED|CONTINUED) = 1210 (ls)
           1.426 ( 0.002 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee1150, options: NOHANG|UNTRACED|CONTINUED) = 0
           3.293 ( 0.604 ms): bash/1211 wait4(upid: -1, stat_addr: 0x7ffe0cee0560                             ) = 1214 (sed)
           3.342 ( 0.002 ms): bash/1211 wait4(upid: -1, stat_addr: 0x7ffe0cee01d0, options: NOHANG            ) = -1 ECHILD No child processes
           3.576 ( 0.016 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee0550, options: NOHANG|UNTRACED|CONTINUED) = 1211 (bash)
        ^C# trace -e clone
           0.027 ( 0.000 ms): systemd/1  ... [continued]: clone()) = 1227 (systemd)
           0.050 ( 0.000 ms): systemd/1227  ... [continued]: clone()) = 0
        ^C[root@jouet ~]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-lyf5d3y5j15wikjb6pe6ukoi@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      11c8e39f
    • A
      perf trace: Beautify wait4/waitid 'options' argument · 7206b900
      Arnaldo Carvalho de Melo 提交于
        # trace -e waitid,wait4
      
         0.557 ( 0.557 ms): bash/27335 wait4(upid: -1, stat_addr: 0x7ffd02f449f0) = 27336
         1.250 ( 0.685 ms): bash/27335 wait4(upid: -1, stat_addr: 0x7ffd02f449f0) = 27337
         1.312 ( 0.002 ms): bash/27335 wait4(upid: -1, stat_addr: 0x7ffd02f44690, options: NOHANG) = -1 ECHILD No child processes
         1.550 ( 0.015 ms): bash/3856 wait4(upid: -1, stat_addr: 0x7ffd02f44990, options: NOHANG|UNTRACED|CONTINUED) = 27335
         1.552 ( 0.001 ms): bash/3856 wait4(upid: -1, stat_addr: 0x7ffd02f44990, options: NOHANG|UNTRACED|CONTINUED) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i5vlo5n5jv0amt8bkyicmdxh@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7206b900
    • A
      perf trace: Beautify sched_setscheduler 'policy' argument · a3bca91f
      Arnaldo Carvalho de Melo 提交于
        $ trace -e sched_setscheduler chrt -f 1 usleep 1
        chrt: failed to set pid 0's policy: Operation not permitted
           0.005 ( 0.005 ms): chrt/19189 sched_setscheduler(policy: FIFO, param: 0x7ffec5273d70) = -1 EPERM Operation not permitted
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i5vlo5n5jv0amt8bkyicmdxh@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a3bca91f
  3. 06 4月, 2016 6 次提交
  4. 02 4月, 2016 6 次提交
    • W
      perf bpf: Add sample types for 'bpf-output' event · d37ba880
      Wang Nan 提交于
      Before this patch we can see very large time in the events before the
      'bpf-output' event. For example:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        18446744073709.551 (18446564645918.480 ms): usleep/4157 nanosleep(rqtp: 0x7ffd3f0dc4e0) ...
        18446744073709.551 (         ): evt:Raise a BPF event!..)
        179427791.076 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179427791.081 (         ): sched:sched_switch:usleep:4157 [120] S ==> swapper/2:0 [120])
        ...
      
      We can also see the differences between bpf-output events and
      breakpoint events:
      
      For bpf output event:
         sample_type                    IP|TID|RAW|IDENTIFIER
      
      For tracepoint events:
         sample_type                    IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
      
      This patch fix this differences by adding more sample type for
      bpf-output events.
      
      After this patch:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        179877370.878 ( 0.003 ms): usleep/5336 nanosleep(rqtp: 0x7ffff866c450) ...
        179877370.878 (         ): evt:Raise a BPF event!..)
        179877370.878 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179877370.882 (         ): sched:sched_switch:usleep:5336 [120] S ==> swapper/4:0 [120])
        179877370.945 (         ): evt:Raise a BPF event!..)
        ...
      
        # ./perf trace -vv -T --ev sched:sched_switch \
                              --ev bpf-output/no-inherit,name=evt/ \
                              --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                              usleep 10 2>&1 | grep sample_type
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
      
      The 'IDENTIFIER' info is not required because all events have the same
      sample_type.
      
      Committer notes:
      
      Further testing, on top of the changes making 'perf trace' avoid samples
      from events without PERF_SAMPLE_TIME:
      
      Before:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
          0.560 ( 0.001 ms): brk(                                                   ) = 0x55e5a1df8000
          18446640227439.430 (18446640227438.859 ms): nanosleep(rqtp: 0x7ffc96643370) ...
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.576 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.645 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
          0.646 ( 0.076 ms):  ... [continued]: nanosleep()) = 0
        #
      
      After:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
           0.292 ( 0.001 ms): brk(                          ) = 0x55c7cd6e1000
           0.302 ( 0.004 ms): nanosleep(rqtp: 0x7ffedd8bc0f0) ...
           0.302 (         ): evt:Raise a BPF event!..)
           0.303 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
           0.397 (         ): evt:Raise a BPF event!..)
           0.397 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
           0.398 ( 0.100 ms):  ... [continued]: nanosleep()) = 0
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Reported-and-Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1459517202-42320-1-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d37ba880
    • A
      perf trace: Don't set the base timestamp using events without PERF_SAMPLE_TIME · 8a07a809
      Arnaldo Carvalho de Melo 提交于
      This was causing bogus values to be shown at the timestamp column:
      
      Before:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        94631143.385 ( 0.001 ms): brk(                                     ) = 0x555555757000
        94631143.398 ( 0.003 ms): mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x7ffff7ff6000
        94631143.406 ( 0.004 ms): access(filename: 0xf7df9e10, mode: R     ) = -1 ENOENT No such file or directory
        94631143.412 ( 0.004 ms): open(filename: 0xf7df8761, flags: CLOEXEC) = 3
        94631143.415 ( 0.002 ms): fstat(fd: 3, statbuf: 0x7fffffffd6b0     ) = 0
        94631143.419 ( 0.003 ms): mmap(len: 106798, prot: READ, flags: PRIVATE, fd: 3) = 0x7ffff7fdb000
        94631143.420 ( 0.001 ms): close(fd: 3                              ) = 0
        94631143.432 ( 0.004 ms): open(filename: 0xf7ff6640, flags: CLOEXEC) = 3
        <SNIP>
      
      After:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        0.022 ( 0.001 ms): brk(                                     ) = 0x55d7668a6000
        0.037 ( 0.003 ms): mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x7f8fbeb97000
        0.123 ( 0.083 ms): access(filename: 0xbe995e10, mode: R     ) = -1 ENOENT No such file or directory
        0.130 ( 0.004 ms): open(filename: 0xbe994761, flags: CLOEXEC) = 3
        0.133 ( 0.002 ms): fstat(fd: 3, statbuf: 0x7fff6487a890     ) = 0
        0.138 ( 0.003 ms): mmap(len: 106798, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8fbeb7c000
        0.140 ( 0.001 ms): close(fd: 3                              ) = 0
        0.151 ( 0.004 ms): open(filename: 0xbeb97640, flags: CLOEXEC) = 3
        <SNIP>
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-p7m8llv81iv55ekxexdp5n57@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8a07a809
    • A
      perf trace: Introduce function to set the base timestamp · e6001980
      Arnaldo Carvalho de Melo 提交于
      That is used in both live runs, i.e.:
      
        # trace ls
      
      As when processing events recorded in a perf.data file:
      
        # trace -i perf.data
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-901l6yebnzeqg7z8mbaf49xb@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e6001980
    • K
      perf tools: Fix PMU term format max value calculation · ac0e2cd5
      Kan Liang 提交于
      Currently the max value of format is calculated by the bits number. It
      relies on the continuity of the format.
      
      However, uncore event format is not continuous. E.g. uncore qpi event
      format can be 0-7,21.
      
      If bit 21 is set, there is parsing issues as below.
      
        $ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
        event syntax error: '..pi_0/event=0x200002,umask=0x8/'
                                          \___ value too big for format, maximum is 511
      
      This patch return the real max value by setting all possible bits to 1.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1459365375-14285-1-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ac0e2cd5
    • A
      perf intel-pt/bts: Define JITDUMP_USE_ARCH_TIMESTAMP · bd0c7a54
      Adrian Hunter 提交于
      For Intel PT / BTS, define the environment variable that selects TSC
      timestamps in the jitdump file.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1457426333-30260-1-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bd0c7a54
    • A
      perf jit: Add support for using TSC as a timestamp · 2a28e230
      Adrian Hunter 提交于
      Intel PT uses TSC as a timestamp, so add support for using TSC instead
      of the monotonic clock.  Use of TSC is selected by an environment
      variable "JITDUMP_USE_ARCH_TIMESTAMP" and flagged in the jitdump file
      with flag JITDUMP_FLAGS_ARCH_TIMESTAMP.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457426330-30226-1-git-send-email-adrian.hunter@intel.com
      [ Added the fixup from He Kuang to make it build on other arches, ]
      [ such as aarch64, to avoid inserting this bisectiong breakage upstream ]
      Link: http://lkml.kernel.org/r/1459482572-129494-1-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a28e230
  5. 31 3月, 2016 16 次提交
    • A
      perf tools: Add time conversion event · 46bc29b9
      Adrian Hunter 提交于
      Intel PT uses the time members from the perf_event_mmap_page to convert
      between TSC and perf time.
      
      Due to a lack of foresight when Intel PT was implemented, those time
      members were recorded in the (implementation dependent) AUXTRACE_INFO
      event, the structure of which is generally inaccessible outside of the
      Intel PT decoder.  However now the conversion between TSC and perf time
      is needed when processing a jitdump file when Intel PT has been used for
      tracing.
      
      So add a user event to record the time members.  'perf record' will
      synthesize the event if the information is available.  And session
      processing will put a copy of the event on the session so that tools
      like 'perf inject' can easily access it.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1457426324-30158-1-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      46bc29b9
    • A
      perf trace: Pretty print getrandom() args · 39878d49
      Arnaldo Carvalho de Melo 提交于
        # trace -e getrandom
        35622.560 ( 0.023 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35622.585 ( 0.006 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35622.594 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35627.395 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        35630.940 ( 0.013 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        35718.613 ( 0.015 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35718.629 ( 0.005 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35718.637 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35719.355 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        35721.042 ( 0.030 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41090.830 ( 0.012 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41090.845 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41090.851 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41091.750 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        41091.823 ( 0.006 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41122.078 ( 0.053 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41122.129 ( 0.009 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41122.139 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41124.492 ( 0.007 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        41124.470 ( 0.013 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41590.832 ( 0.014 ms): chrome/5957 getrandom(buf: 0x7fabac7b15b0, count: 16, flags: NONBLOCK      ) = 16
        41590.884 ( 0.004 ms): chrome/5957 getrandom(buf: 0x7fabac7b15c0, count: 16, flags: NONBLOCK      ) = 16
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-gca0n1p3aca3depey703ph2q@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      39878d49
    • A
      perf trace: Pretty print seccomp() args · 997bba8c
      Arnaldo Carvalho de Melo 提交于
      E.g:
      
        # trace -e seccomp
         200.061 (0.009 ms): :2441/2441 seccomp(op: FILTER, flags: TSYNC                       ) = -1 EFAULT Bad address
         200.910 (0.121 ms): :2441/2441 seccomp(op: FILTER, flags: TSYNC, uargs: 0x7fff57479fe0) = 0
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-t369uckshlwp4evkks4bcoo7@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      997bba8c
    • A
      perf trace: Do not process PERF_RECORD_LOST twice · 3ed5ca2e
      Arnaldo Carvalho de Melo 提交于
      We catch this record to provide a visual indication that events are
      getting lost, then call the default method to allow extra logging shared
      with the other tools to take place.
      
      This extra logging was done twice because we were continuing to the
      "default" clause where machine__process_event() will end up calling
      machine__process_lost_event() again, fix it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wus2zlhw3qo24ye84ewu4aqw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ed5ca2e
    • W
      perf/ring_buffer: Prepare writing into the ring-buffer from the end · d1b26c70
      Wang Nan 提交于
      Convert perf_output_begin() to __perf_output_begin() and make the later
      function able to write records from the end of the ring-buffer.
      
      Following commits will utilize the 'backward' flag.
      
      This is the core patch to support writing to the ring-buffer backwards,
      which will be introduced by upcoming patches to support reading from
      overwritable ring-buffers.
      
      In theory, this patch should not introduce any extra performance
      overhead since we use always_inline, but it does not hurt to double
      check that assumption:
      
      When CONFIG_OPTIMIZE_INLINING is disabled, the output object is nearly
      identical to original one. See:
      
         http://lkml.kernel.org/g/56F52E83.70409@huawei.com
      
      When CONFIG_OPTIMIZE_INLINING is enabled, the resuling object file becomes
      smaller:
      
       $ size kernel/events/ring_buffer.o*
         text       data        bss        dec        hex    filename
         4641          4          8       4653       122d kernel/events/ring_buffer.o.old
         4545          4          8       4557       11cd kernel/events/ring_buffer.o.new
      
      Performance testing results:
      
      Calling 3000000 times of 'close(-1)', use gettimeofday() to check
      duration.  Use 'perf record -o /dev/null -e raw_syscalls:*' to capture
      system calls. In ns.
      
      Testing environment:
      
       CPU    : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
       Kernel : v4.5.0
      
                           MEAN         STDVAR
        BASE            800214.950    2853.083
        PRE            2253846.700    9997.014
        POST           2257495.540    8516.293
      
      Where 'BASE' is pure performance without capturing. 'PRE' is test
      result of pure 'v4.5.0' kernel. 'POST' is test result after this
      patch.
      
      Considering the stdvar, this patch doesn't hurt performance, within
      noise margin.
      
      For testing details, see:
      
        http://lkml.kernel.org/g/56F89DCD.1040202@huawei.comSigned-off-by: NWang Nan <wangnan0@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-4-git-send-email-wangnan0@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d1b26c70
    • W
      perf/core: Set event's default ::overflow_handler() · 1879445d
      Wang Nan 提交于
      Set a default event->overflow_handler in perf_event_alloc() so don't
      need to check event->overflow_handler in __perf_event_overflow().
      Following commits can give a different default overflow_handler.
      
      Initial idea comes from Peter:
      
        http://lkml.kernel.org/r/20130708121557.GA17211@twins.programming.kicks-ass.net
      
      Since the default value of event->overflow_handler is not NULL, existing
      'if (!overflow_handler)' checks need to be changed.
      
      is_default_overflow_handler() is introduced for this.
      
      No extra performance overhead is introduced into the hot path because in the
      original code we still need to read this handler from memory. A conditional
      branch is avoided so actually we remove some instructions.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-3-git-send-email-wangnan0@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1879445d
    • W
      perf/ring_buffer: Introduce new ioctl options to pause and resume the ring-buffer · 86e7972f
      Wang Nan 提交于
      Add new ioctl() to pause/resume ring-buffer output.
      
      In some situations we want to read from the ring-buffer only when we
      ensure nothing can write to the ring-buffer during reading. Without
      this patch we have to turn off all events attached to this ring-buffer
      to achieve this.
      
      This patch is a prerequisite to enable overwrite support for the
      perf ring-buffer support. Following commits will introduce new methods
      support reading from overwrite ring buffer. Before reading, caller
      must ensure the ring buffer is frozen, or the reading is unreliable.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-2-git-send-email-wangnan0@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86e7972f
    • J
      ftrace/perf: Check sample types only for sampling events · 0a74c5b3
      Jiri Olsa 提交于
      Currently we check sample type for ftrace:function events
      even if it's not created as a sampling event. That prevents
      creating ftrace_function event in counting mode.
      
      Make sure we check sample types only for sampling events.
      
      Before:
        $ sudo perf stat -e ftrace:function ls
        ...
      
         Performance counter stats for 'ls':
      
           <not supported>      ftrace:function
      
               0.001983662 seconds time elapsed
      
      After:
        $ sudo perf stat -e ftrace:function ls
        ...
      
         Performance counter stats for 'ls':
      
                    44,498      ftrace:function
      
               0.037534722 seconds time elapsed
      Suggested-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1458138873-1553-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0a74c5b3
    • A
      perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks · 981a4cb3
      Alexander Shishkin 提交于
      As per AUX buffer management requirement, AUX output has to happen between
      pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
      and therefore perf can free the AUX data after it has called pmu::stop.
      
      This patch moves perf_aux_output_{begin,end} from bts_event_{add,del} to
      bts_event_{start,stop}. As a bonus, we get rid of bts_buffer_is_full(),
      which is already taken care of by perf_aux_output_begin() anyway.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-6-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      981a4cb3
    • A
      perf/x86/intel/pt: Move transaction start/stop to PMU start/stop callbacks · 66d21901
      Alexander Shishkin 提交于
      As per AUX buffer management requirement, AUX output has to happen between
      pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
      and therefore perf can free the AUX data after it has called pmu::stop.
      
      This patch moves perf_aux_output_{begin,end} from pt_event_{add,del} to
      pt_event_{start,stop}. As a bonus, we get rid of pt_buffer_is_full(),
      which is already taken care of by perf_aux_output_begin() anyway.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-5-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66d21901
    • A
      perf/ring_buffer: Document AUX API usage · af5bb4ed
      Alexander Shishkin 提交于
      In order to ensure safe AUX buffer management, we rely on the assumption
      that pmu::stop() stops its ongoing AUX transaction and not just the hw.
      
      This patch documents this requirement for the perf_aux_output_{begin,end}()
      APIs.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af5bb4ed
    • A
      perf/core: Free AUX pages in unmap path · 95ff4ca2
      Alexander Shishkin 提交于
      Now that we can ensure that when ring buffer's AUX area is on the way
      to getting unmapped new transactions won't start, we only need to stop
      all events that can potentially be writing aux data to our ring buffer.
      
      Having done that, we can safely free the AUX pages and corresponding
      PMU data, as this time it is guaranteed to be the last aux reference
      holder.
      
      This partially reverts:
      
        57ffc5ca ("perf: Fix AUX buffer refcounting")
      
      ... which was made to defer deallocation that was otherwise possible
      from an NMI context. Now it is no longer the case; the last call to
      rb_free_aux() that drops the last AUX reference has to happen in
      perf_mmap_close() on that AUX area.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/87d1qtz23d.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      95ff4ca2
    • A
      perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops · dcb10a96
      Alexander Shishkin 提交于
      When ring buffer's AUX area is unmapped and rb->aux_mmap_count drops to
      zero, new AUX transactions into this buffer can still be started,
      even though the buffer in en route to deallocation.
      
      This patch adds a check to perf_aux_output_begin() for rb->aux_mmap_count
      being zero, in which case there is no point starting new transactions,
      in other words, the ring buffers that pass a certain point in
      perf_mmap_close will not have their events sending new data, which
      clears path for freeing those buffers' pages right there and then,
      provided that no active transactions are holding the AUX reference.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dcb10a96
    • P
      perf/core: Verify we have a single perf_hw_context PMU · 26657848
      Peter Zijlstra 提交于
      There should (and can) only be a single PMU for perf_hw_context
      events.
      
      This is because of how we schedule events: once a hardware event fails to
      schedule (the PMU is 'full') we stop trying to add more. The trivial
      'fix' would break the Round-Robin scheduling we do.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      26657848
    • P
      perf/x86: Move Kconfig.perf and other perf configuration bits to events/Kconfig · 07dc900e
      Peter Zijlstra 提交于
      Ingo says:
      
       "If we do a separate file we should have it in arch/x86/events/Kconfig
        (not in arch/x86/Kconfig.perf), and also move some of the other bits,
        such as PERF_EVENTS_AMD_POWER?"
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      07dc900e
    • H
      perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter · aaf24884
      Huang Rui 提交于
      AMD Zeppelin (Family 17h, Model 00h) introduces an instructions
      retired performance counter which is indicated by
      CPUID.8000_0008H:EBX[1]. A dedicated Instructions Retired MSR register
      (MSR 0xC000_000E9) increments once for every instruction retired.
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1454056197-5893-3-git-send-email-ray.huang@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aaf24884