1. 16 10月, 2019 1 次提交
    • A
      perf trace: Introduce --errno-summary · b88b14db
      Arnaldo Carvalho de Melo 提交于
      To be used with -S or -s, using just this new option implies -s,
      examples:
      
        # perf trace --errno-summary sleep 1
      
         Summary of events:
      
         sleep (10793), 80 events, 93.0%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           nanosleep              1      0  1000.427  1000.427  1000.427  1000.427      0.00%
           mmap                   8      0     0.026     0.002     0.003     0.005      9.18%
           close                  5      0     0.018     0.001     0.004     0.009     48.97%
           mprotect               4      0     0.017     0.003     0.004     0.006     16.49%
           openat                 3      0     0.012     0.003     0.004     0.005      9.41%
           munmap                 1      0     0.010     0.010     0.010     0.010      0.00%
           brk                    4      0     0.005     0.001     0.001     0.002     22.77%
           read                   4      0     0.005     0.001     0.001     0.002     22.33%
           access                 1      1     0.004     0.004     0.004     0.004      0.00%
        				ENOENT: 1
           fstat                  3      0     0.004     0.001     0.001     0.002     17.18%
           lseek                  3      0     0.003     0.001     0.001     0.001     11.62%
           arch_prctl             2      1     0.002     0.001     0.001     0.001      3.32%
        				EINVAL: 1
           execve                 1      0     0.000     0.000     0.000     0.000      0.00%
      
        #
      
      Works as well together with --failure and -S, i.e. collect the stats and
      show just the syscalls that failed:
      
        # perf trace --failure -S --errno-summary sleep 1
             0.032 arch_prctl(option: 0x3001, arg2: 0x7fffdb11b580) = -1 EINVAL (Invalid argument)
             0.045 access(filename: "/etc/ld.so.preload", mode: R) = -1 ENOENT (No such file or directory)
      
         Summary of events:
      
         sleep (10806), 80 events, 93.0%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           nanosleep              1      0  1000.094  1000.094  1000.094  1000.094      0.00%
           mmap                   8      0     0.026     0.002     0.003     0.005      9.06%
           close                  5      0     0.018     0.001     0.004     0.010     49.58%
           mprotect               4      0     0.017     0.003     0.004     0.006     17.56%
           openat                 3      0     0.014     0.004     0.005     0.006     12.29%
           munmap                 1      0     0.010     0.010     0.010     0.010      0.00%
           brk                    4      0     0.005     0.001     0.001     0.002     22.75%
           read                   4      0     0.005     0.001     0.001     0.002     17.19%
           access                 1      1     0.005     0.005     0.005     0.005      0.00%
        				ENOENT: 1
           fstat                  3      0     0.004     0.001     0.001     0.002     21.66%
           lseek                  3      0     0.003     0.001     0.001     0.001     11.71%
           arch_prctl             2      1     0.002     0.001     0.001     0.001      2.66%
        				EINVAL: 1
           execve                 1      0     0.000     0.000     0.000     0.000      0.00%
      
        #
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-l0mjwczkpouov7lss5zn8d9h@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b88b14db
  2. 15 10月, 2019 14 次提交
    • A
      perf trace: Add syscall failure stats to -s/--summary and -S/--with-summary · 8eded45f
      Arnaldo Carvalho de Melo 提交于
      Just like strace has:
      
        # trace -s sleep 1
      
        Summary of events:
      
        sleep (32370), 80 events, 93.0%
      
          syscall            calls  errors  total       min       avg       max       stddev
                                            (msec)    (msec)    (msec)    (msec)        (%)
          --------------- --------  ------ -------- --------- --------- ---------     ------
          nanosleep              1      0  1000.402  1000.402  1000.402  1000.402      0.00%
          mmap                   8      0     0.023     0.002     0.003     0.004      8.49%
          close                  5      0     0.015     0.001     0.003     0.009     51.39%
          mprotect               4      0     0.014     0.002     0.003     0.005     16.95%
          openat                 3      0     0.013     0.003     0.004     0.005     14.29%
          munmap                 1      0     0.010     0.010     0.010     0.010      0.00%
          read                   4      0     0.005     0.001     0.001     0.002     16.83%
          brk                    4      0     0.004     0.001     0.001     0.002     20.82%
          access                 1      1     0.004     0.004     0.004     0.004      0.00%
          fstat                  3      0     0.003     0.001     0.001     0.001     12.17%
          lseek                  3      0     0.003     0.001     0.001     0.001     11.45%
          arch_prctl             2      1     0.002     0.001     0.001     0.001      2.30%
          execve                 1      0     0.000     0.000     0.000     0.000      0.00%
      
        #
      
        # perf trace -S sleep 1
               ?  ... [continued]: execve())             = 0
           0.028 brk(brk: NULL)                          = 0x559f5bd96000
           0.033 arch_prctl(option: 0x3001, arg2: 0x7ffda8b715a0) = -1 EINVAL (Invalid argument)
           0.046 access(filename: "/etc/ld.so.preload", mode: R) = -1 ENOENT (No such file or directory)
           0.055 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) = 3
           0.060 fstat(fd: 3, statbuf: 0x7ffda8b707a0)   = 0
           0.062 mmap(addr: NULL, len: 134346, prot: READ, flags: PRIVATE, fd: 3, off: 0) = 0x7f3aedfc4000
           0.066 close(fd: 3)                            = 0
           0.079 openat(dfd: CWD, filename: "/lib64/libc.so.6", flags: RDONLY|CLOEXEC) = 3
           0.085 read(fd: 3, buf: 0x7ffda8b70948, count: 832) = 832
           0.088 lseek(fd: 3, offset: 792, whence: SET)  = 792
           0.090 read(fd: 3, buf: 0x7ffda8b70810, count: 68) = 68
           0.093 fstat(fd: 3, statbuf: 0x7ffda8b707f0)   = 0
           0.095 mmap(addr: NULL, len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7f3aedfc2000
           0.101 lseek(fd: 3, offset: 792, whence: SET)  = 792
           0.103 read(fd: 3, buf: 0x7ffda8b70450, count: 68) = 68
           0.105 lseek(fd: 3, offset: 864, whence: SET)  = 864
           0.107 read(fd: 3, buf: 0x7ffda8b70470, count: 32) = 32
           0.110 mmap(addr: NULL, len: 1857472, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3, off: 0) = 0x7f3aeddfc000
           0.114 mprotect(start: 0x7f3aede1e000, len: 1679360, prot: NONE) = 0
           0.121 mmap(addr: 0x7f3aede1e000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000) = 0x7f3aede1e000
           0.127 mmap(addr: 0x7f3aedf6b000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000) = 0x7f3aedf6b000
           0.131 mmap(addr: 0x7f3aedfb8000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000) = 0x7f3aedfb8000
           0.138 mmap(addr: 0x7f3aedfbe000, len: 14272, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f3aedfbe000
           0.147 close(fd: 3)                            = 0
           0.158 arch_prctl(option: SET_FS, arg2: 0x7f3aedfc3580) = 0
           0.210 mprotect(start: 0x7f3aedfb8000, len: 16384, prot: READ) = 0
           0.230 mprotect(start: 0x559f5b27d000, len: 4096, prot: READ) = 0
           0.236 mprotect(start: 0x7f3aee00f000, len: 4096, prot: READ) = 0
           0.240 munmap(addr: 0x7f3aedfc4000, len: 134346) = 0
           0.300 brk(brk: NULL)                          = 0x559f5bd96000
           0.302 brk(brk: 0x559f5bdb7000)                = 0x559f5bdb7000
           0.305 brk(brk: NULL)                          = 0x559f5bdb7000
           0.310 openat(dfd: CWD, filename: "/usr/lib/locale/locale-archive", flags: RDONLY|CLOEXEC) = 3
           0.315 fstat(fd: 3, statbuf: 0x7f3aedfbdac0)   = 0
           0.318 mmap(addr: NULL, len: 217750512, prot: READ, flags: PRIVATE, fd: 3, off: 0) = 0x7f3ae0e52000
           0.325 close(fd: 3)                            = 0
           0.358 nanosleep(rqtp: 0x7ffda8b714b0, rmtp: NULL) = 0
        1000.622 close(fd: 1)                            = 0
        1000.641 close(fd: 2)                            = 0
        1000.664 exit_group(error_code: 0)               = ?
      
         Summary of events:
      
         sleep (722), 80 events, 93.0%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           nanosleep              1      0  1000.194  1000.194  1000.194  1000.194      0.00%
           mmap                   8      0     0.025     0.002     0.003     0.005     10.17%
           close                  5      0     0.018     0.001     0.004     0.010     50.18%
           mprotect               4      0     0.016     0.003     0.004     0.006     16.81%
           openat                 3      0     0.011     0.003     0.004     0.004      6.57%
           munmap                 1      0     0.010     0.010     0.010     0.010      0.00%
           brk                    4      0     0.005     0.001     0.001     0.002     20.72%
           read                   4      0     0.005     0.001     0.001     0.002     16.71%
           access                 1      1     0.005     0.005     0.005     0.005      0.00%
           fstat                  3      0     0.004     0.001     0.001     0.002     14.82%
           lseek                  3      0     0.003     0.001     0.001     0.001     11.66%
           arch_prctl             2      1     0.002     0.001     0.001     0.001      3.59%
           execve                 1      0     0.000     0.000     0.000     0.000      0.00%
      
        #
      
      Works for system wide, e.g. for 1ms:
      
        # perf trace -s -a sleep 0.001
      
         Summary of events:
      
         sleep (768), 94 events, 37.9%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           nanosleep              1      0     1.133     1.133     1.133     1.133      0.00%
           execve                 7      6     0.351     0.003     0.050     0.316     88.53%
           mmap                   8      0     0.024     0.002     0.003     0.004      8.86%
           mprotect               4      0     0.017     0.003     0.004     0.006     16.02%
           openat                 3      0     0.013     0.004     0.004     0.005      8.34%
           munmap                 1      0     0.010     0.010     0.010     0.010      0.00%
           brk                    4      0     0.007     0.001     0.002     0.002     10.99%
           close                  5      0     0.005     0.001     0.001     0.002     11.69%
           read                   5      0     0.005     0.000     0.001     0.002     30.53%
           access                 1      1     0.004     0.004     0.004     0.004      0.00%
           fstat                  3      0     0.004     0.001     0.001     0.002     10.74%
           lseek                  3      0     0.003     0.001     0.001     0.001     10.20%
           arch_prctl             2      1     0.002     0.001     0.001     0.001      3.34%
      
         Web Content (21258), 46 events, 18.5%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           recvmsg               12     12     0.015     0.001     0.001     0.002      8.50%
           futex                  2      0     0.008     0.003     0.004     0.005     27.08%
           poll                   6      0     0.006     0.000     0.001     0.002     22.14%
           read                   2      0     0.006     0.002     0.003     0.003     26.08%
           write                  1      0     0.002     0.002     0.002     0.002      0.00%
      
         Web Content (4365), 36 events, 14.5%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           recvmsg               10     10     0.015     0.001     0.002     0.003     11.83%
           poll                   5      0     0.006     0.000     0.001     0.002     28.44%
           futex                  2      0     0.005     0.001     0.003     0.004     48.29%
           read                   1      0     0.003     0.003     0.003     0.003      0.00%
      
         Timer (21275), 14 events, 5.6%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           futex                  6      1     0.240     0.000     0.040     0.149     64.58%
           write                  1      0     0.008     0.008     0.008     0.008      0.00%
      
         Timer (4383), 14 events, 5.6%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           futex                  6      2     0.186     0.000     0.031     0.181     96.45%
           write                  1      0     0.010     0.010     0.010     0.010      0.00%
      
         Web Content (20354), 28 events, 11.3%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           recvmsg                8      8     0.010     0.001     0.001     0.002     15.24%
           poll                   4      0     0.004     0.000     0.001     0.002     35.68%
           futex                  1      0     0.003     0.003     0.003     0.003      0.00%
           read                   1      0     0.003     0.003     0.003     0.003      0.00%
      
         Timer (20371), 10 events, 4.0%
      
           syscall            calls  errors  total       min       avg       max       stddev
                                             (msec)    (msec)    (msec)    (msec)        (%)
           --------------- --------  ------ -------- --------- --------- ---------     ------
           futex                  4      1     0.077     0.000     0.019     0.075     95.46%
           write                  1      0     0.005     0.005     0.005     0.005      0.00%
      
        [root@quaco ~]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-k7kh2muo5oeg56yx446hnw9v@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8eded45f
    • J
      perf stat: Support --all-kernel/--all-user · dd071024
      Jin Yao 提交于
      'perf record' has supported --all-kernel / --all-user to configure all
      used events to run in kernel space or run in user space. But 'perf stat'
      doesn't support these options.
      
      It would be useful to support these options in 'perf stat' too to keep
      the same semantics available in both tools.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191011050545.3899-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dd071024
    • T
      perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy() · 5fb470bc
      Thomas Richter 提交于
      The build of file libperf-jvmti.so succeeds but the resulting
      object fails to load:
      
       # ~/linux/tools/perf/perf record -k mono -- java  \
            -XX:+PreserveFramePointer \
            -agentpath:/root/linux/tools/perf/libperf-jvmti.so \
             hog 100000 123450
        Error occurred during initialization of VM
        Could not find agent library /root/linux/tools/perf/libperf-jvmti.so
            in absolute path, with error:
            /root/linux/tools/perf/libperf-jvmti.so: undefined symbol: _ctype
      
      Add the missing _ctype symbol into the build script.
      
      Fixes: 79743bc9 ("perf jvmti: Link against tools/lib/string.o to have weak strlcpy()")
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20191008093841.59387-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5fb470bc
    • I
      perf annotate: Fix objdump --no-show-raw-insn flag · c5baf908
      Ian Rogers 提交于
      Remove redirection of objdump's stderr to /dev/null to help diagnose
      failures.
      
      Fix the '--no-show-raw' flag to be '--no-show-raw-insn' which binutils
      is permissive and allows, but fails with LLVM objdump.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20191010183649.23768-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5baf908
    • I
      perf annotate: Don't pipe objdump output through 'expand' command · b34b45ee
      Ian Rogers 提交于
      Avoiding a pipe allows objdump command failures to surface.  Move to the
      caller of symbol__parse_objdump_line the call to strim that removes
      leading and trailing tabs.  Add a new expand_tabs function that if a tab
      is present allocate a new line in which tabs are expanded.  In
      symbol__parse_objdump_line the line had no leading spaces, so simplify
      the line_ip processing.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20191010183649.23768-5-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b34b45ee
    • I
      perf annotate: Don't pipe objdump output through 'grep' command · 7a675de4
      Ian Rogers 提交于
      Simplify the objdump command by not piping the output of objdump through
      grep. Instead, drop lines that match the grep pattern during the reading
      loop.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20191010183649.23768-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a675de4
    • I
      perf annotate: Use libsubcmd's run-command.h to fork objdump · 42359499
      Ian Rogers 提交于
      Reduce duplicated logic by using the subcmd library. Ensure when errors
      occur they are reported to the caller. Before this patch, if no lines
      are read the error status is 0.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20191010183649.23768-3-irogers@google.com
      Link: http://lore.kernel.org/lkml/20191015003418.62563-1-irogers@google.com
      [ merged follow up fix for NULL termination as in the 2nd link above ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42359499
    • I
      perf annotate: Avoid reallocation in objdump parsing · 353dcaa2
      Ian Rogers 提交于
      Objdump output is parsed using getline which allocates memory for the
      read. Getline will realloc if the memory is too small, but currently the
      line is always freed after the call.
      
      Simplify parse_objdump_line by performing the reading in symbol__disassemble.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20191010183649.23768-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      353dcaa2
    • J
      perf report: Add warning when libunwind not compiled in · 800d3f56
      Jin Yao 提交于
      We received a user report that call-graph DWARF mode was enabled in
      'perf record' but 'perf report' didn't unwind the callstack correctly.
      The reason was, libunwind was not compiled in.
      
      We can use 'perf -vv' to check the compiled libraries but it would be
      valuable to report a warning to user directly (especially valuable for
      a perf newbie).
      
      The warning is:
      
      Warning:
      Please install libunwind development packages during the perf build.
      
      Both TUI and stdio are supported.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191011022122.26369-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      800d3f56
    • L
      perf test: Avoid infinite loop for task exit case · 791ce9c4
      Leo Yan 提交于
      When executing the task exit testing case, perf gets stuck in an endless
      loop this case and doesn't return back on Arm64 Juno board.
      
      After digging into this issue, since Juno board has Arm's big.LITTLE
      CPUs, thus the PMUs are not compatible between the big CPUs and little
      CPUs.  This leads to a PMU event that cannot be enabled properly when
      the traced task is migrated from one variant's CPU to another variant.
      Finally, the test case runs into infinite loop for cannot read out any
      event data after return from polling.
      
      Eventually, we need to work out formal solution to allow PMU events can
      be freely migrated from one CPU variant to another, but this is a
      difficult task and a different topic.  This patch tries to fix the Perf
      test case to avoid infinite loop, when the testing detects 1000 times
      retrying for reading empty events, it will directly bail out and return
      failure.  This allows the Perf tool can continue its other test cases.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20191011091942.29841-2-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      791ce9c4
    • L
      perf test: Report failure for mmap events · 6add129c
      Leo Yan 提交于
      When fail to mmap events in task exit case, it misses to set 'err' to
      -1; thus the testing will not report failure for it.
      
      This patch sets 'err' to -1 when fails to mmap events, thus Perf tool
      can report correct result.
      
      Fixes: d723a550 ("perf test: Add test case for checking number of EXIT events")
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20191011091942.29841-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6add129c
    • A
      perf evlist: Fix fix for freed id arrays · 5a40e199
      Andi Kleen 提交于
      In the earlier fix for the memory overrun of id arrays I managed to typo
      the wrong event in the fix.
      
      Of course we need to close the current event in the loop, not the
      original failing event.
      
      The same test case as in the original patch still passes.
      
      Fixes: 7834fa94 ("perf evlist: Fix access of freed id arrays")
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lore.kernel.org/lkml/20191011182140.8353-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5a40e199
    • A
      perf script: Fix --reltime with --time · b3509b6e
      Andi Kleen 提交于
      My earlier patch to just enable --reltime with --time was a little too
      optimistic.  The --time parsing would accept absolute time, which is
      very confusing to the user.
      
      Support relative time in --time parsing too. This only works with recent
      perf record that records the first sample time. Otherwise we error out.
      
      Fixes: 3714437d ("perf script: Allow --time with --reltime")
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lore.kernel.org/lkml/20191011182140.8353-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b3509b6e
    • J
      perf tools: Allow to build with -ltcmalloc · bb91a073
      Jiri Olsa 提交于
      By using "make TCMALLOC=1" you can enable perf to be build for usage
      with libtcmalloc.so (gperftools).
      
      Get heap profile (tools/perf directory):
      
        $ <install gperftools>
        $ make TCMALLOC=1 DEBUG=1
        $ HEAPPROFILE=/tmp/heapprof ./perf ...
        $ pprof ./perf /tmp/heapprof.000*
        (pprof) top
        Total: 2335.5 MB
          1735.1  74.3%  74.3%   1735.1  74.3% memdup
           402.0  17.2%  91.5%    402.0  17.2% zalloc
           140.2   6.0%  97.5%    145.8   6.2% map__new
            33.6   1.4%  98.9%     33.6   1.4% symbol__new
            12.4   0.5%  99.5%     12.4   0.5% alloc_event
             6.2   0.3%  99.7%      6.2   0.3% nsinfo__new
             5.5   0.2% 100.0%      5.5   0.2% nsinfo__copy
             0.3   0.0% 100.0%      0.3   0.0% dso__new
             0.1   0.0% 100.0%      0.1   0.0% do_read_string
             0.0   0.0% 100.0%      0.0   0.0% __GI__IO_file_doallocate
      
      See callstack:
        $ pprof --pdf ./perf /tmp/heapprof.00* > callstack.pdf
        $ pprof --web ./perf /tmp/heapprof.00*
      
      Committer testing:
      
      Install gperftools, on fedora:
      
        # dnf install gperftools-devel
      
      Then build:
      
       $ make TCMALLOC=1 DEBUG=1 -C tools/perf O=/tmp/build/perf install-bin
      
      Verify that it linked against the right library:
      
        $ ldd ~/bin/perf | grep tcma
      	libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007fb2953a7000)
        $
      
      Run 'perf trace' system wide for 1 minute:
      
        # HEAPPROFILE=/tmp/heapprof perf trace -a sleep 1m
        <SNIP>
         59985.524 ( 0.006 ms): Web Content/20354 recvmsg(fd: 9<socket:[1762817]>, msg: 0x7ffee5fdafb0) = -1 EAGAIN (Resource temporarily unavailable)
         59985.536 ( 0.005 ms): Web Content/20354 recvmsg(fd: 9<socket:[1762817]>, msg: 0x7ffee5fdafc0) = -1 EAGAIN (Resource temporarily unavailable)
         59981.956 (10.143 ms): SCTP timer/21716  ... [continued]: select())                            = 0 (Timeout)
         59985.549 (         ): Web Content/20354 poll(ufds: 0x7f1df38af180, nfds: 3, timeout_msecs: 4294967295) ...
             0.926 (59999.481 ms): sleep/29764  ... [continued]: nanosleep())                           = 0
         59992.133 (         ): SCTP timer/21716 select(tvp: 0x7ff5bf7fee80)                            ...
         60000.477 ( 0.009 ms): sleep/29764 close(fd: 1)                                                = 0
         60000.493 ( 0.005 ms): sleep/29764 close(fd: 2)                                                = 0
         60000.514 (         ): sleep/29764 exit_group()                                                = ?
        Dumping heap profile to /tmp/heapprof.0001.heap (Exiting, 3 MB in use)
      [root@quaco ~]#
      
      Install pprof:
      
        # dnf install pprof
      
      And run it:
      
        # pprof ~/bin/perf /tmp/heapprof.0001.heap
        Using local file /root/bin/perf.
        Using local file /tmp/heapprof.0001.heap.
        Welcome to pprof!  For help, type 'help'.
        (pprof) top
        Total: 4.0 MB
             1.7  42.0%  42.0%      2.2  54.1% map__new
             0.9  23.3%  65.3%      0.9  23.3% zalloc
             0.5  11.4%  76.7%      0.5  11.4% dso__new
             0.2   5.6%  82.3%      0.3   8.5% trace__sys_enter
             0.2   4.9%  87.2%      0.2   4.9% __GI___strdup
             0.2   3.8%  91.0%      0.2   3.8% new_term
             0.1   2.2%  93.2%      0.4  10.1% __perf_pmu__new_alias
             0.0   1.0%  94.3%      0.0   1.2% event_read_fields
             0.0   0.8%  95.1%      0.0   0.8% nsinfo__new
             0.0   0.7%  95.8%      0.1   3.2% trace__read_syscall_info
        (pprof)
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191013151427.11941-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bb91a073
  3. 11 10月, 2019 2 次提交
    • J
      perf diff: Report noisy for cycles diff · cebf7d51
      Jin Yao 提交于
      This patch prints the stddev and hist for the cycles diff of program
      block. It can help us to understand if the cycles is noisy or not.
      
      This patch is inspired by Andi Kleen's patch:
      
        https://lwn.net/Articles/600471/
      
      We create new option '--cycles-hist'.
      
      Example:
      
        perf record -b ./div
        perf record -b ./div
        perf diff -c cycles
      
        # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
        # ........  .......................................................... ....  .................  ............................
        #
            46.72%                                      [div.c:40 -> div.c:40]    0  div                [.] main
            46.72%                                      [div.c:42 -> div.c:44]    0  div                [.] main
            46.72%                                      [div.c:42 -> div.c:39]    0  div                [.] main
            20.54%                          [random_r.c:357 -> random_r.c:394]    1  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:357 -> random_r.c:380]    0  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:388]    0  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:391]    0  libc-2.27.so       [.] __random_r
            17.04%                              [random.c:288 -> random.c:291]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:291 -> random.c:291]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:293 -> random.c:293]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:298 -> random.c:298]    0  libc-2.27.so       [.] __random
             8.40%                                      [div.c:22 -> div.c:25]    0  div                [.] compute_flag
             8.40%                                      [div.c:27 -> div.c:28]    0  div                [.] compute_flag
             5.14%                                    [rand.c:26 -> rand.c:27]    0  libc-2.27.so       [.] rand
             5.14%                                    [rand.c:28 -> rand.c:28]    0  libc-2.27.so       [.] rand
             2.15%                                  [rand@plt+0 -> rand@plt+0]    0  div                [.] rand@plt
             0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
             0.00%                                [do_mmap+714 -> do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+737 -> do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+262 -> do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
             0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
             0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
             0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr
      
      When we enable the option '--cycles-hist', the output is
      
        perf diff -c cycles --cycles-hist
      
        # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
        # ........  .......................................................... ....  .................  .................  ............................
        #
            46.72%                                      [div.c:40 -> div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
            46.72%                                      [div.c:42 -> div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
            46.72%                                      [div.c:42 -> div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
            20.54%                          [random_r.c:357 -> random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:357 -> random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:388]    0                     libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
            17.04%                              [random.c:288 -> random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:291 -> random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:293 -> random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0                     libc-2.27.so       [.] __random
            17.04%                              [random.c:298 -> random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
             8.40%                                      [div.c:22 -> div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
             8.40%                                      [div.c:27 -> div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
             5.14%                                    [rand.c:26 -> rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
             5.14%                                    [rand.c:28 -> rand.c:28]    0                     libc-2.27.so       [.] rand
             2.15%                                  [rand@plt+0 -> rand@plt+0]    0                     div                [.] rand@plt
             0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
             0.00%                                [do_mmap+714 -> do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+737 -> do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+262 -> do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
             0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
             0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
             0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr
      
       v8:
       ---
       Rebase to perf/core branch
      
       v7:
       ---
       1. v6 got Jiri's ACK.
       2. Rebase to latest perf/core branch.
      
       v6:
       ---
       1. Jiri provides better code for using data__hpp_register() in ui_init().
          Use this code in v6.
      
       v5:
       ---
       1. Refine the use of data__hpp_register() in ui_init() according to
          Jiri's suggestion.
      
       v4:
       ---
       1. Rename the new option from '--noisy' to '--cycles-hist'
       2. Remove the option '-n'.
       3. Only update the spark value and stats when '--cycles-hist' is enabled.
       4. Remove the code of printing '..'.
      
       v3:
       ---
       1. Move the histogram to a separate column
       2. Move the svals[] out of struct stats
      
       v2:
       ---
       Jiri got a compile error,
      
        CC       builtin-diff.o
        builtin-diff.c: In function ‘compute_cycles_diff’:
        builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
        712 |          labs(pair->block_info->cycles_spark[i] -
            |          ^~~~
      
       Because the result of u64 - u64 is still u64. Now we change the type of
       cycles_spark[] to s64.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cebf7d51
    • J
      perf tools: Propagate CFLAGS to libperf · 55542113
      Jiri Olsa 提交于
      Andi reported that 'make DEBUG=1' does not propagate to the libbperf
      code. It's true also for the other flags. Changing the code to propagate
      the global build flags to libperf compilation.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191011122155.15738-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      55542113
  4. 10 10月, 2019 23 次提交