1. 05 3月, 2018 11 次提交
    • K
      perf trace: Switch to new perf_mmap__read_event() interface · d7f55c62
      Kan Liang 提交于
      The 'perf trace' utility still use the legacy interface.
      
      Switch to the new perf_mmap__read_event() interface.
      
      No functional change.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1519945751-37786-2-git-send-email-kan.liang@linux.intel.com
      [ Changed bool parameters from 0 to 'false', as per Jiri comment ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7f55c62
    • K
      perf kvm: Switch to new perf_mmap__read_event() interface · 53172f90
      Kan Liang 提交于
      The perf kvm still use the legacy interface.
      
      Switch to the new perf_mmap__read_event() interface for perf kvm.
      
      No functional change.
      
      Committer notes:
      
      Tested before and after running:
      
        # perf kvm stat record
      
      On a machine with a kvm guest, then used:
      
        # perf kvm stat report
      
      Before/after results match and look like:
      
        # perf kvm stat record -a sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.132 MB perf.data.guest (1828 samples) ]
        # perf kvm stat report
      
        Analyze events for all VMs, all VCPUs:
      
                   VM-EXIT Samples Samples%  Time% Min Time    Max Time    Avg time
      
            IO_INSTRUCTION     258   40.06%  0.08%   3.51us    122.54us     14.87us (+- 6.76%)
                 MSR_WRITE     178   27.64%  0.01%   0.47us      6.34us      2.18us (+- 4.80%)
             EPT_MISCONFIG     148   22.98%  0.03%   3.76us     65.60us     11.22us (+- 8.14%)
                       HLT      47    7.30% 99.88% 181.69us 249988.06us 102061.36us (+-13.49%)
         PAUSE_INSTRUCTION       5    0.78%  0.00%   0.38us      0.79us      0.47us (+-17.05%)
                  MSR_READ       4    0.62%  0.00%   1.14us      3.33us      2.67us (+-19.35%)
        EXTERNAL_INTERRUPT       2    0.31%  0.00%   2.15us      2.17us      2.16us (+- 0.30%)
         PENDING_INTERRUPT       1    0.16%  0.00%   2.56us      2.56us      2.56us (+- 0.00%)
          PREEMPTION_TIMER       1    0.16%  0.00%   3.21us      3.21us      3.21us (+- 0.00%)
      
        Total Samples:644, Total events handled time:4802790.72us.
      
        #
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1519945751-37786-1-git-send-email-kan.liang@linux.intel.com
      [ Changed bool parameters from 0 to 'false', as per Jiri comment ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      53172f90
    • J
      perf record: Fix crash in pipe mode · ad46e48c
      Jiri Olsa 提交于
      Currently we can crash perf record when running in pipe mode, like:
      
        $ perf record ls | perf report
        # To display the perf.data header info, please use --header/--header-only options.
        #
        perf: Segmentation fault
        Error:
        The - file has no samples!
      
      The callstack of the crash is:
      
          0x0000000000515242 in perf_event__synthesize_event_update_name
        3513            ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]);
        (gdb) bt
        #0  0x0000000000515242 in perf_event__synthesize_event_update_name
        #1  0x00000000005158a4 in perf_event__synthesize_extra_attr
        #2  0x0000000000443347 in record__synthesize
        #3  0x00000000004438e3 in __cmd_record
        #4  0x000000000044514e in cmd_record
        #5  0x00000000004cbc95 in run_builtin
        #6  0x00000000004cbf02 in handle_internal_command
        #7  0x00000000004cc054 in run_argv
        #8  0x00000000004cc422 in main
      
      The reason of the crash is that the evsel does not have ids array
      allocated and the pipe's synthesize code tries to access it.
      
      We don't force evsel ids allocation when we have single event, because
      it's not needed. However we need it when we are in pipe mode even for
      single event as a key for evsel update event.
      
      Fixing this by forcing evsel ids allocation event for single event, when
      we are in pipe mode.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180302161354.30192-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ad46e48c
    • A
      perf annotate: Find 'call' instruction target symbol at parsing time · 696703af
      Arnaldo Carvalho de Melo 提交于
      So that we do it just once, not everytime we press enter or -> on a
      'call' instruction line.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-uysyojl1e6nm94amzzzs08tf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      696703af
    • A
      perf record: Throttle user defined frequencies to the maximum allowed · b09c2364
      Arnaldo Carvalho de Melo 提交于
        # perf record -F 200000 sleep 1
        warning: Maximum frequency rate (15,000 Hz) exceeded, throttling from 200,000 Hz to 15,000 Hz.
                 The limit can be raised via /proc/sys/kernel/perf_event_max_sample_rate.
                 The kernel will lower it when perf's interrupts take too long.
      	   Use --strict-freq to disable this throttling, refusing to record.
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.019 MB perf.data (15 samples) ]
        # perf evlist -v
        cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
      
      For those wanting that it fails if the desired frequency can't be used:
      
        # perf record --strict-freq -F 200000 sleep 1
        error: Maximum frequency rate (15,000 Hz) exceeded.
               Please use -F freq option with a lower value or consider
               tweaking /proc/sys/kernel/perf_event_max_sample_rate.
        #
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-oyebruc44nlja499nqkr1nzn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b09c2364
    • A
      perf top: Allow asking for the maximum allowed sample rate · 7831bf23
      Arnaldo Carvalho de Melo 提交于
      Add the handy '-F max' shortcut, just introduced to 'perf record', to
      reading and using the kernel.perf_event_max_sample_rate value as the
      user supplied sampling frequency:
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-hz04f296zccknnb5at06a6q0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7831bf23
    • A
      perf top browser: Show sample_freq in browser title line · a9980a6d
      Arnaldo Carvalho de Melo 提交于
      The '--stdio' 'perf top' UI shows it, so lets remove this UI difference
      and show it too in '--tui', will be useful for 'perf top --tui -F max'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-n3wd8n395uo4y9irst29pjic@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9980a6d
    • A
      perf record: Allow asking for the maximum allowed sample rate · 67230479
      Arnaldo Carvalho de Melo 提交于
      Add the handy '-F max' shortcut to reading and using the
      kernel.perf_event_max_sample_rate value as the user supplied
      sampling frequency:
      
        # perf record -F max sleep 1
        info: Using a maximum frequency rate of 15,000 Hz
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ]
        # sysctl kernel.perf_event_max_sample_rate
        kernel.perf_event_max_sample_rate = 15000
        # perf evlist -v
        cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
      
        # perf record -F 10 sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.019 MB perf.data (4 samples) ]
        # perf evlist -v
        cycles:ppp: size: 112, { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        #
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-4y0tiuws62c64gp4cf0hme0m@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      67230479
    • J
      perf tests: Rename trace+probe_libc_inet_pton to record+probe_libc_inet_pton · 4f673368
      Jiri Olsa 提交于
      Because the test is no longer using perf trace but perf record instead.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180301165215.6780-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4f673368
    • J
      perf tests: Switch trace+probe_libc_inet_pton to use record · a18ee796
      Jiri Olsa 提交于
      There's a problem with relying on backtrace data from 'perf trace' the
      way the trace+probe_libc_inet_pton does. This test inserts uprobe within
      ping binary and checks that it gets its sample using 'perf trace'.
      
      It also checks it gets proper backtrace from sample and that's where the
      issue is.
      
      The 'perf trace' does not sort events (by definition) so it can happen
      that it processes the event sample before the ping binary memory map
      event. This can (very rarely) happen as proved by this events dump
      output (from custom added debug output):
      
        ...
        7680/7680: [0x7f4e29718000(0x204000) @ 0 fd:00 33611321 4230892504]: r-xp /usr/lib64/libdl-2.17.so
        7680/7680: [0x7f4e29502000(0x216000) @ 0 fd:00 33617257 2606846872]: r-xp /usr/lib64/libz.so.1.2.7
        (IP, 0x2): 7680/7680: 0x7f4e29c2ed60 period: 1 addr: 0
        7680/7680: [0x564842ef0000(0x233000) @ 0 fd:00 83 1989280200]: r-xp /usr/bin/ping
        7680/7680: [0x7f4e2aca2000(0x224000) @ 0 fd:00 33611308 1219144940]: r-xp /usr/lib64/ld-2.17.so
        ...
      
      In this case 'perf trace' fails to resolve the last callchain IP (within
      the ping binary) because it does not know about the ping binary memory
      map yet and the test fails like this:
      
        PING ::1(::1) 56 data bytes
        64 bytes from ::1: icmp_seq=1 ttl=64 time=0.037 ms
        --- ::1 ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.037/0.037/0.037/0.000 ms
        0.000 probe_libc:inet_pton:(7f4e29c2ed60))
        __GI___inet_pton (/usr/lib64/libc-2.17.so)
        getaddrinfo (/usr/lib64/libc-2.17.so)
        [0] ([unknown])
        FAIL: expected backtrace entry 8 ".*\(.*/bin/ping.*\)$" got "[0] ([unknown])"
      
      Switching the test to use 'perf record' and 'perf script' instead of
      'perf trace'.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180301165215.6780-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a18ee796
    • A
      perf annotate browser: Be more robust when drawing jump arrows · 9c04409d
      Arnaldo Carvalho de Melo 提交于
      This first happened with a gcc function, _cpp_lex_token, that has the
      usual jumps:
      
       │1159e6c: ↓ jne    115aa32 <_cpp_lex_token@@base+0xf92>
      
      I.e. jumps to a label inside that function (_cpp_lex_token), and those
      works, but also this kind:
      
       │1159e8b: ↓ jne    c469be <cpp_named_operator2name@@base+0xa72>
      
      I.e. jumps to another function, outside _cpp_lex_token, which are not
      being correctly handled generating as a side effect references to
      ab->offset[] entries that are set to NULL, so to make this code more
      robust, check that here.
      
      A proper fix for will be put in place, looking at the function name
      right after the '<' token and probably treating this like a 'call'
      instruction.
      
      For now just don't draw the arrow.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: https://lkml.kernel.org/n/tip-5tzvb875ep2sel03aeefgmud@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9c04409d
  2. 27 2月, 2018 2 次提交
    • J
      perf stat: Ignore error thread when enabling system-wide --per-thread · ab6c79b8
      Jin Yao 提交于
      If we execute 'perf stat --per-thread' with non-root account (even set
      kernel.perf_event_paranoid = -1 yet), it reports the error:
      
        jinyao@skl:~$ perf stat --per-thread
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
              Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
        >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
              Disallow raw tracepoint access by users without CAP_SYS_ADMIN
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
      
        To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
      
                kernel.perf_event_paranoid = -1
      
      Perhaps the ptrace rule doesn't allow to trace some processes. But anyway
      the global --per-thread mode had better ignore such errors and continue
      working on other threads.
      
      This patch will record the index of error thread in perf_evsel__open()
      and remove this thread before retrying.
      
      For example (run with non-root, kernel.perf_event_paranoid isn't set):
      
        jinyao@skl:~$ perf stat --per-thread
        ^C
         Performance counter stats for 'system wide':
      
               vmstat-3458    6.171984   cpu-clock:u (msec) #  0.000 CPUs utilized
                 perf-3670    0.515599   cpu-clock:u (msec) #  0.000 CPUs utilized
               vmstat-3458   1,163,643   cycles:u           #  0.189 GHz
                 perf-3670      40,881   cycles:u           #  0.079 GHz
               vmstat-3458   1,410,238   instructions:u     #  1.21  insn per cycle
                 perf-3670       3,536   instructions:u     #  0.09  insn per cycle
               vmstat-3458     288,937   branches:u         # 46.814 M/sec
                 perf-3670         936   branches:u         #  1.815 M/sec
               vmstat-3458      15,195   branch-misses:u    #  5.26% of all branches
                 perf-3670          76   branch-misses:u    #  8.12% of all branches
      
              12.651675247 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1516117388-10120-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab6c79b8
    • K
      perf top: Fix annoying fallback message on older kernels · 853745f5
      Kan Liang 提交于
      On older (e.g. v4.4) kernels, an annoying fallback message can be
      observed in 'perf top':
      
      	┌─Warning:──────────────────────┐
      	│fall back to non-overwrite mode│
      	│                               │
      	│                               │
      	│Press any key...               │
      	└───────────────────────────────┘
      
      The 'perf top' utility has been changed to overwrite mode since commit
      ebebbf08 ("perf top: Switch default mode to overwrite mode").
      
      For older kernels which don't have overwrite mode support, 'perf top'
      will fall back to non-overwrite mode and print out the fallback message
      using ui__warning(), which needs user's input to close.
      
      The fallback message is not critical for end users. Turning it to debug
      message which is printed when running with -vv.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Fixes: ebebbf08 ("perf top: Switch default mode to overwrite mode")
      Link: http://lkml.kernel.org/r/1519669030-176549-1-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      853745f5
  3. 22 2月, 2018 1 次提交
  4. 21 2月, 2018 3 次提交
  5. 19 2月, 2018 5 次提交
    • J
      perf tools: Add Python 3 support · 66dfdff0
      Jaroslav Škarvada 提交于
      Added Python 3 support while keeping Python 2.7 compatibility.
      
      Committer notes:
      
      This doesn't make it to auto detect python 3, one has to explicitely ask
      it to build with python 3 devel files, here are the instructions
      provided by Jaroslav:
      
       ---
        $ cp -a tools/perf tools/python3-perf
        $ make V=1 prefix=/usr -C tools/perf PYTHON=/usr/bin/python2 all
        $ make V=1 prefix=/usr -C tools/python3-perf PYTHON=/usr/bin/python3 all
        $ make V=1 prefix=/usr -C tools/python3-perf PYTHON=/usr/bin/python3 DESTDIR=%{buildroot} install-python_ext
        $ make V=1 prefix=/usr -C tools/perf PYTHON=/usr/bin/python2 DESTDIR=%{buildroot} install-python_ext
       ---
      
      We need to make this automatic, just like the existing tests for checking if
      the python2 devel files are in place, allowing the build with python3 if
      available, fallbacking to python2 and then just disabling it if none are
      available.
      
      So, using the PYTHON variable to build it using O= we get:
      
      Before this patch:
      
        $ rpm -q python3 python3-devel
        python3-3.6.4-7.fc27.x86_64
        python3-devel-3.6.4-7.fc27.x86_64
        $ rm -rf /tmp/build/perf/ ; mkdir -p /tmp/build/perf ; make O=/tmp/build/perf PYTHON=/usr/bin/python3 -C tools/perf install-bin
        make: Entering directory '/home/acme/git/linux/tools/perf'
        <SNIP>
        Makefile.config:670: Python 3 is not yet supported; please set
        Makefile.config:671: PYTHON and/or PYTHON_CONFIG appropriately.
        Makefile.config:672: If you also have Python 2 installed, then
        Makefile.config:673: try something like:
        Makefile.config:674:
        Makefile.config:675:   make PYTHON=python2
        Makefile.config:676:
        Makefile.config:677: Otherwise, disable Python support entirely:
        Makefile.config:678:
        Makefile.config:679:   make NO_LIBPYTHON=1
        Makefile.config:680:
        Makefile.config:681: *** .  Stop.
        make[1]: *** [Makefile.perf:212: sub-make] Error 2
        make: *** [Makefile:110: install-bin] Error 2
        make: Leaving directory '/home/acme/git/linux/tools/perf'
        $
      
      After:
      
        $ make O=/tmp/build/perf PYTHON=python3 -C tools/perf install-bin
        $ ldd ~/bin/perf | grep python
      	libpython3.6m.so.1.0 => /lib64/libpython3.6m.so.1.0 (0x00007f58a31e8000)
        $ rpm -qf /lib64/libpython3.6m.so.1.0
        python3-libs-3.6.4-7.fc27.x86_64
        $
      
      Now verify that when using the binding the right ELF file is loaded,
      using perf trace:
      
        $ perf trace -e open* perf test python
           0.051 ( 0.016 ms): perf/3927 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
      <SNIP>
        18: 'import perf' in python                               :
           8.849 ( 0.013 ms): sh/3929 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
      <SNIP>
          25.572 ( 0.008 ms): python3/3931 openat(dfd: CWD, filename: /tmp/build/perf/python/perf.cpython-36m-x86_64-linux-gnu.so, flags: CLOEXEC) = 3
      <SNIP>
       Ok
      <SNIP>
        $
      
      And using tools/perf/python/twatch.py, to show PERF_RECORD_ metaevents:
      
        $ python3 tools/perf/python/twatch.py
        cpu: 3, pid: 16060, tid: 16060 { type: fork, pid: 5207, ppid: 16060, tid: 5207, ptid: 16060, time: 10798513015459}
        cpu: 3, pid: 16060, tid: 16060 { type: fork, pid: 5208, ppid: 16060, tid: 5208, ptid: 16060, time: 10798513562503}
        cpu: 0, pid: 5208, tid: 5208 { type: comm, pid: 5208, tid: 5208, comm: grep }
        cpu: 2, pid: 5207, tid: 5207 { type: comm, pid: 5207, tid: 5207, comm: ps }
        cpu: 2, pid: 5207, tid: 5207 { type: exit, pid: 5207, ppid: 5207, tid: 5207, ptid: 5207, time: 10798551337484}
        cpu: 3, pid: 5208, tid: 5208 { type: exit, pid: 5208, ppid: 5208, tid: 5208, ptid: 5208, time: 10798551292153}
        cpu: 3, pid: 601, tid: 601 { type: fork, pid: 5209, ppid: 601, tid: 5209, ptid: 601, time: 10801779977324}
        ^CTraceback (most recent call last):
          File "tools/perf/python/twatch.py", line 68, in <module>
            main()
          File "tools/perf/python/twatch.py", line 40, in main
            evlist.poll(timeout = -1)
        KeyboardInterrupt
        $
      
        # ps ax|grep twatch
       5197 pts/8    S+     0:00 python3 tools/perf/python/twatch.py
        # ls -la /proc/5197/smaps
        -r--r--r--. 1 acme acme 0 Feb 19 13:14 /proc/5197/smaps
        # grep python /proc/5197/smaps
        558111307000-558111309000 r-xp 00000000 fd:00 3151710  /usr/bin/python3.6
        558111508000-558111509000 r--p 00001000 fd:00 3151710  /usr/bin/python3.6
        558111509000-55811150a000 rw-p 00002000 fd:00 3151710  /usr/bin/python3.6
        7ffad6fc1000-7ffad7008000 r-xp 00000000 00:2d 220196   /tmp/build/perf/python/perf.cpython-36m-x86_64-linux-gnu.so
        7ffad7008000-7ffad7207000 ---p 00047000 00:2d 220196   /tmp/build/perf/python/perf.cpython-36m-x86_64-linux-gnu.so
        7ffad7207000-7ffad7208000 r--p 00046000 00:2d 220196   /tmp/build/perf/python/perf.cpython-36m-x86_64-linux-gnu.so
        7ffad7208000-7ffad7215000 rw-p 00047000 00:2d 220196   /tmp/build/perf/python/perf.cpython-36m-x86_64-linux-gnu.so
        7ffadea77000-7ffaded3d000 r-xp 00000000 fd:00 3151795  /usr/lib64/libpython3.6m.so.1.0
        7ffaded3d000-7ffadef3c000 ---p 002c6000 fd:00 3151795  /usr/lib64/libpython3.6m.so.1.0
        7ffadef3c000-7ffadef42000 r--p 002c5000 fd:00 3151795  /usr/lib64/libpython3.6m.so.1.0
        7ffadef42000-7ffadefa5000 rw-p 002cb000 fd:00 3151795  /usr/lib64/libpython3.6m.so.1.0
        #
      
      And with this patch, but building normally, without specifying the
      PYTHON=python3 part, which will make it use python2 if its devel files are
      available, like in this test:
      
        $ make O=/tmp/build/perf -C tools/perf install-bin
        $ ldd ~/bin/perf | grep python
      	libpython2.7.so.1.0 => /lib64/libpython2.7.so.1.0 (0x00007f6a44410000)
        $ ldd /tmp/build/perf/python_ext_build/lib/perf.so  | grep python
      	libpython2.7.so.1.0 => /lib64/libpython2.7.so.1.0 (0x00007fed28a2c000)
        $
      
        [acme@jouet perf]$ tools/perf/python/twatch.py
        cpu: 0, pid: 2817, tid: 2817 { type: fork, pid: 2817, ppid: 2817, tid: 8910, ptid: 2817, time: 11126454335306}
        cpu: 0, pid: 2817, tid: 2817 { type: comm, pid: 2817, tid: 8910, comm: worker }
        $ ps ax | grep twatch.py
         8909 pts/8    S+     0:00 /usr/bin/python tools/perf/python/twatch.py
        $ grep python /proc/8909/smaps
        5579de658000-5579de659000 r-xp 00000000 fd:00 3156044  /usr/bin/python2.7
        5579de858000-5579de859000 r--p 00000000 fd:00 3156044  /usr/bin/python2.7
        5579de859000-5579de85a000 rw-p 00001000 fd:00 3156044  /usr/bin/python2.7
        7f0de01f7000-7f0de023e000 r-xp 00000000 00:2d 230695   /tmp/build/perf/python/perf.so
        7f0de023e000-7f0de043d000 ---p 00047000 00:2d 230695   /tmp/build/perf/python/perf.so
        7f0de043d000-7f0de043e000 r--p 00046000 00:2d 230695   /tmp/build/perf/python/perf.so
        7f0de043e000-7f0de044b000 rw-p 00047000 00:2d 230695   /tmp/build/perf/python/perf.so
        7f0de6f0f000-7f0de6f13000 r-xp 00000000 fd:00 134975   /usr/lib64/python2.7/lib-dynload/_localemodule.so
        7f0de6f13000-7f0de7113000 ---p 00004000 fd:00 134975   /usr/lib64/python2.7/lib-dynload/_localemodule.so
        7f0de7113000-7f0de7114000 r--p 00004000 fd:00 134975   /usr/lib64/python2.7/lib-dynload/_localemodule.so
        7f0de7114000-7f0de7115000 rw-p 00005000 fd:00 134975   /usr/lib64/python2.7/lib-dynload/_localemodule.so
        7f0de7e73000-7f0de8052000 r-xp 00000000 fd:00 3173292  /usr/lib64/libpython2.7.so.1.0
        7f0de8052000-7f0de8251000 ---p 001df000 fd:00 3173292  /usr/lib64/libpython2.7.so.1.0
        7f0de8251000-7f0de8255000 r--p 001de000 fd:00 3173292  /usr/lib64/libpython2.7.so.1.0
        7f0de8255000-7f0de8291000 rw-p 001e2000 fd:00 3173292  /usr/lib64/libpython2.7.so.1.0
        $
      Signed-off-by: NJaroslav Škarvada <jskarvad@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      LPU-Reference: 20180119205641.24242-1-jskarvad@redhat.com
      Link: https://lkml.kernel.org/n/tip-8d7dt9kqp83vsz25hagug8fu@git.kernel.org
      [ Removed explicit check for python version, allowing it to really build with python3 ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      66dfdff0
    • A
      perf python: Make twatch.py work with both python2 and python3 · d2ed5d2b
      Arnaldo Carvalho de Melo 提交于
      Will be used to test patches allowing to build perf with python3, so
      that we make sure that we can build with both versions.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jaroslav Škarvada <jskarvad@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-c2ynv0ozr3eifzsyit6qgh3h@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2ed5d2b
    • C
      perf ftrace: Append an EOL when write tracing files · 63cd02d8
      Changbin Du 提交于
      Before this change, the '--graph-funcs', '--nograph-funcs' and
      '--trace-funcs' options didn't work as expected when the <func> doesn't
      exist. Because the kernel side hid possible errors.
      
        $ sudo ./perf ftrace -a --graph-depth 1 --graph-funcs abcdefg
         0)   0.140 us    |  rcu_all_qs();
         3)   0.304 us    |  mutex_unlock();
         0)   0.153 us    |  find_vma();
         3)   0.088 us    |  __fsnotify_parent();
         0)   6.145 us    |  handle_mm_fault();
         3)   0.089 us    |  fsnotify();
         3)   0.161 us    |  __sb_end_write();
         3)   0.710 us    |  SyS_close();
         3)   7.848 us    |  exit_to_usermode_loop();
      
      On the example above, I specified the function filter 'abcdefg' but all
      functions are enabled. The expected result is for all functions to be
      filtered, since there is no such function ('abcdefg')
      
      The original fix is to make the kernel support '\0' as end of string:
      https://lkml.org/lkml/2018/1/16/116
      
      But above fix cannot be compatible with old kernels. Then Namhyung Kim
      suggest adding a space after function name.
      
      This patch will append an '\n' when write tracing file. After this fix,
      the perf will report correct error state. Also let it print an error if
      reset_tracing_files() fails.
      
      Committer testing:
      
      Now it prints:
      
        # perf ftrace -a --graph-depth 1 --graph-funcs abcdefg
        failed to set tracing filters
        #
      
      And for an existing function:
      
        # perf ftrace -a --graph-depth 1 --graph-funcs SyS_open
         3)               |  SyS_open() {
         3) ! 494.899 us  |  }
         0) + 23.910 us   |  SyS_open();
         1) + 17.115 us   |  SyS_open();
         1) + 13.900 us   |  SyS_open();
         ------------------------------------------
         3)  qemu-sy-2817  =>  pickup-1290
         ------------------------------------------
      
         3) + 20.021 us   |  SyS_open();
        #
      Signed-off-by: NChangbin Du <changbin.du@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1519007609-14551-1-git-send-email-changbin.du@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63cd02d8
    • N
      perf machine: Fix paranoid check in machine__set_kernel_mmap() · 1d12cec6
      Namhyung Kim 提交于
      The machine__set_kernel_mmap() is to setup addresses of the kernel map
      using external info.  But it has a check when the address is given from
      an incorrect input which should have the start and end address of 0
      (i.e. machine__process_kernel_mmap_event).
      
      But we also use the end address of 0 for a valid input so change it to
      check both start and end addresses.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20180219101936.GD1583@sejongSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1d12cec6
    • T
      perf s390: Fix reading cpuid model information · 47812e00
      Thomas Richter 提交于
      Commit eca0fa28 (perf record: Provide detailed information on s390
      CPU") fixed a  build error on Ubuntu. However the fix uses the wrong
      size to print the model information.
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Fixes: eca0fa28 ("perf record: Provide detailed information on s390 CPU")
      Link: http://lkml.kernel.org/r/20180219102444.96900-1-tmricht@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      47812e00
  6. 17 2月, 2018 18 次提交
    • I
      Merge tag 'perf-core-for-mingo-4.17-20180216' of... · 11737ca9
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-4.17-20180216' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      - Fix wrong jump arrow in systems with branch records with cycles,
        i.e. Intel's >= Skylake (Jin Yao)
      
      - Fix 'perf record --per-thread' problem introduced when
        implementing 'perf stat --per-thread (Jin Yao)
      
      - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
        that was using strcmp(symbol names) while the dso routines
        doing symbol lookups used the arch overridable one, making
        this test fail in architectures that overrided that function
        with something other than strcmp() (Jiri Olsa)
      
      - Add 'perf script --show-round-event' to display
        PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
      
      - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
      
      - Use ordered_events for 'perf report --tasks', otherwise we may get
        artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
        (when they got recorded in different CPUs) (Jiri Olsa)
      
      - Add support to display group output for non group events, i.e.
        now when one uses 'perf report --group' on a perf.data file
        recorded without explicitly grouping events with {} (e.g.
        "perf record -e '{cycles,instructions}'" get the same output
        that would produce, i.e. see all those non-grouped events in
        multiple columns, at the same time (Jiri Olsa)
      
      - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
      
      - Kernel maps fixes wrt perf.data(report) versus live system (top)
        (Jiri Olsa)
      
      - Fix memory corruption when using 'perf record -j call -g -a <application>'
        followed by 'perf report --branch-history' (Jiri Olsa)
      
      - ARM CoreSight fixes (Mathieu Poirier)
      
      - Add inject capability for CoreSight Traces (Robert Waker)
      
      - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
      
      - Man pages fixes (Sangwon Hong, Jaecheol Shin)
      
      - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
        changed with a glibc update) (Thomas Richter)
      
      - Add detailed CPUID info in the 'perf.data' headers for s/390 to
        then use it in 'perf annotate' (Thomas Richter)
      
      - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
        'perf stat -I 1000 --interval-count 2' will show stats every
         1000ms, two times (yuzhoujian)
      
      - Add 'perf stat --timeout Nms', that will run for that many
        milliseconds and then stop, printing the counters (yuzhoujian)
      
      - Fix description for 'perf report --mem-modex (Andi Kleen)
      
      - Use a wildcard to remove the vfs_getname probe in the
        'perf test' shell based test cases (Arnaldo Carvalho de Melo)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      11737ca9
    • I
      7057bb97
    • A
      perf tests shell lib: Use a wildcard to remove the vfs_getname probe · 21316ac6
      Arnaldo Carvalho de Melo 提交于
      In some situations the vfs_getname is being added both as requested and
      with a _1 suffix (inlines?):
      
        probe:vfs_getname_1  (on getname_flags:63@acme/git/linux/fs/namei.c with pathname)
      
      This ends up making the cleanup to miss that one, as it removes just
      'probe:vfs_getname', which makes the second test to use this probe point
      to fail, since it finds that leftover from the first test, use a
      wildcard to remove both.
      
      Before:
      
        # perf test 60 61 62 63
        60: Use vfs_getname probe to get syscall args filenames   : FAILED!
        61: probe libc's inet_pton & backtrace it with ping       : Ok
        62: Check open filename arg using perf trace + vfs_getname: FAILED!
        63: Add vfs_getname probe to get syscall args filenames   : Ok
      
      After:
      
        # perf test 60 61 62 63
        60: Use vfs_getname probe to get syscall args filenames   : Ok
        61: probe libc's inet_pton & backtrace it with ping       : Ok
        62: Check open filename arg using perf trace + vfs_getname: Ok
        63: Add vfs_getname probe to get syscall args filenames   : Ok
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-2k5kutwr4ds36adiakyb4yvy@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      21316ac6
    • T
      perf test: Fix test case inet_pton to accept inlines. · 0f19a038
      Thomas Richter 提交于
      Using Fedora 27 and latest Linux kernel the test case
      trace+probe_libc_inet_pton.sh fails again on s390.  This time is the
      inlining of functions which does not match.  After an update of the
      glibc (from 2.26-16 to 2.26-24) the output is different
      
      The expected output is:
      
                   __inet_pton (/usr/lib64/libc-2.26.so)
                   gaih_inet (inlined)
                   ....
      
      The actual output is:
      
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms
             0.000 probe_libc:inet_pton:(3ffb2140448))
                   __inet_pton (inlined)
                   gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
                   ...
      
      Fix this by being less strict on 'inlined' verses library name and
      accept both
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180214070303.55757-1-tmricht@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f19a038
    • T
      perf test: Fix test case 23 for s390 z/VM or KVM guests · b3be39c5
      Thomas Richter 提交于
      On s390 perf can be executed on a LPAR with support for hardware events
      (i. e. cycles) or on a z/VM or KVM guest where no hardware events are
      supported. In this environment use software event named cpu-clock for
      this test case.
      
      Use the cpuid infrastructure functions to determine the cpuid on s390
      which contains an indication of the cpu counter facility availability.
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180213151419.80737-4-tmricht@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b3be39c5
    • T
      perf cpuid: Introduce a platform specific cpuid compare function · 4cb7d3ec
      Thomas Richter 提交于
      The function get_cpuid_str() is called by perf_pmu__getcpuid() and on
      s390 returns a complete description of the CPU and its capabilities,
      which is a comma separated list.
      
      To map the CPU type with the value defined in the
      pmu-events/arch/s390/mapfile.csv, introduce an architecture specific
      cpuid compare function named strcmp_cpuid_str()
      
      The currently used regex algorithm is defined as the weak default and
      will be used if no platform specific one is defined. This matches the
      current behavior.
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180213151419.80737-3-tmricht@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4cb7d3ec
    • T
      perf annotate: Scan cpuid for s390 and save machine type · c59124fa
      Thomas Richter 提交于
      Scan the cpuid string and extract the type number for later use.
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180213151419.80737-2-tmricht@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c59124fa
    • T
      perf record: Provide detailed information on s390 CPU · eca0fa28
      Thomas Richter 提交于
      When perf record ... is setup to record data, the s390 cpu information
      was a fixed string "IBM/S390".
      
      Replace this string with one containing more information about the
      machine. The information included in the cpuid is a comma separated
      list:
      
         manufacturer,type,model-capacity,model[,version,authorization]
      with
      
      - manufacturer: up to 16 byte name of the manufacturer (IBM).
      - type: a four digit number refering to the machine
        generation.
      - model-capacitiy: up to 16 characters describing number
        of cpus etc.
      - model: up to 16 characters describing model.
      - version: the CPU-MF counter facility version number,
        available on LPARs only, omitted on z/VM guests.
      - authorization: the CPU-MF counter facility authorization level,
        available on LPARs only, omitted on z/VM guests.
      
      Before:
      
        [root@s8360047 perf]# ./perf record -- sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data (4 samples) ]
        [root@s8360047 perf]# ./perf report --header | fgrep cpuid
         # cpuid : IBM/S390
        [root@s8360047 perf]#
      
      After:
      
        [root@s35lp76 perf]# ./perf report --header|fgrep cpuid
         # cpuid : IBM,3906,704,M03,3.5,002f
        [root@s35lp76 perf]#
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180213151419.80737-1-tmricht@linux.vnet.ibm.com
      [ Use scnprintf instead of strncat to fix build errors on gcc GNU C99 5.4.0 20160609 -march=zEC12 -m64 -mzarch -ggdb3 -O6 -std=gnu99 -fPIC -fno-omit-frame-pointer -funwind-tables -fstack-protector-all ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eca0fa28
    • R
      perf trace powerpc: Use generated syscall table · 4281da23
      Ravi Bangoria 提交于
      This should speed up accessing new system calls introduced with the
      kernel rather than waiting for libaudit updates to include them.
      
      It also enables users to specify wildcards, for example, perf trace -e
      'open*', just like was already possible on x86 and s390.
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20180129083417.31240-4-ravi.bangoria@linux.vnet.ibm.com
      [ Do it for ppc32 as well ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4281da23
    • R
      perf powerpc: Generate system call table from asm/unistd.h · 8e2ff72a
      Ravi Bangoria 提交于
      This should speed up accessing new system calls introduced with the
      kernel rather than waiting for libaudit updates to include them.
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20180129083417.31240-3-ravi.bangoria@linux.vnet.ibm.com
      [ Made it generate syscall_32.c as well to fix the build on 32-bit ppc ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8e2ff72a
    • R
      tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h · 1350fb7d
      Ravi Bangoria 提交于
      Will be used for generating the syscall id/string translation table.
      
      Committer notes:
      
      Update it already to catch with these csets applied since Ravi first
      submitted this patch:
      
        3350eb2e powerpc: sys_pkey_mprotect() system call
        9499ec1b powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
      
      So now 'perf trace' on ppc now knows about the pkey_ syscals.
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20180129083417.31240-2-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1350fb7d
    • J
      perf report: Fix memory corruption in --branch-history mode --branch-history · e3ebaa46
      Jiri Olsa 提交于
      Jin Yao reported memory corrupton in perf report with
      branch info used for stack trace:
      
        > Following command lines will cause perf crash.
      
        > perf record -j call -g -a <application>
        > perf report --branch-history
        >
        > *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 ***
        > ======= Backtrace: =========
        > /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725]
        > /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a]
        > /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc]
        > perf[0x51b914]
        > perf(hist_entry_iter__add+0x1e5)[0x51f305]
        > perf[0x43cf01]
        > perf[0x4fa3bf]
        > perf[0x4fa923]
        > perf[0x4fd396]
        > perf[0x4f9614]
        > perf(perf_session__process_events+0x89e)[0x4fc38e]
        > perf(cmd_report+0x15d2)[0x43f202]
        > perf[0x4a059f]
        > perf(main+0x631)[0x427b71]
        > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830]
        > perf(_start+0x29)[0x427d89]
      
      For the cumulative output, we allocate the he_cache array based on the
      --max-stack option value and populate it with data from 'callchain_cursor'.
      
      The --max-stack option value does not ensure now the limit for number of
      callchain_cursor nodes, so the cumulative iter code will allocate smaller array
      than it's actually needed and cause above corruption.
      
      I think the --max-stack limit does not apply here anyway, because we add
      callchain data as normal hist entries, while the --max-stack control the limit
      of single entry callchain depth.
      
      Using the callchain_cursor.nr as he_cache array count to fix this. Also
      removing struct hist_entry_iter::max_stack, because there's no longer any use
      for it.
      
      We need more fixes to ensure that the branch stack code follows properly the
      logic of --max-stack, which is not the case at the moment.
      Original-patch-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180216123619.GA9945@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e3ebaa46
    • J
      perf report: Fix wrong jump arrow · b40982e8
      Jin Yao 提交于
      When we use perf report interactive annotate view, we can see
      the position of jump arrow is not correct. For example,
      
      1. perf record -b ...
      2. perf report
      3. In interactive mode, select Annotate 'function'
      
      Percent│ IPC Cycle
             │                                if (flag)
        1.37 │0.4┌──   1      ↓ je     82
             │   │                                    x += x / y + y / x;
        0.00 │0.4│  1310        movsd  (%rsp),%xmm0
        0.00 │0.4│   565        movsd  0x8(%rsp),%xmm4
             │0.4│              movsd  0x8(%rsp),%xmm1
             │0.4│              movsd  (%rsp),%xmm3
             │0.4│              divsd  %xmm4,%xmm0
        0.00 │0.4│   579        divsd  %xmm3,%xmm1
             │0.4│              movsd  (%rsp),%xmm2
             │0.4│              addsd  %xmm1,%xmm0
             │0.4│              addsd  %xmm2,%xmm0
        0.00 │0.4│              movsd  %xmm0,(%rsp)
             │   │                    volatile double x = 1212121212, y = 121212;
             │   │
             │   │                    s_randseed = time(0);
             │   │                    srand(s_randseed);
             │   │
             │   │                    for (i = 0; i < 2000000000; i++) {
        1.37 │0.4└─→      82:   sub    $0x1,%ebx
       28.21 │0.48    17      ↑ jne    38
      
      The jump arrow in above example is not correct. It should add the
      width of IPC and Cycle.
      
      With this patch, the result is:
      
      Percent│ IPC Cycle
             │                                if (flag)
        1.37 │0.48     1     ┌──je     82
             │               │                        x += x / y + y / x;
        0.00 │0.48  1310     │  movsd  (%rsp),%xmm0
        0.00 │0.48   565     │  movsd  0x8(%rsp),%xmm4
             │0.48           │  movsd  0x8(%rsp),%xmm1
             │0.48           │  movsd  (%rsp),%xmm3
             │0.48           │  divsd  %xmm4,%xmm0
        0.00 │0.48   579     │  divsd  %xmm3,%xmm1
             │0.48           │  movsd  (%rsp),%xmm2
             │0.48           │  addsd  %xmm1,%xmm0
             │0.48           │  addsd  %xmm2,%xmm0
        0.00 │0.48           │  movsd  %xmm0,(%rsp)
             │               │        volatile double x = 1212121212, y = 121212;
             │               │
             │               │        s_randseed = time(0);
             │               │        srand(s_randseed);
             │               │
             │               │        for (i = 0; i < 2000000000; i++) {
        1.37 │0.48        82:└─→sub    $0x1,%ebx
       28.21 │0.48    17      ↑ jne    38
      
      Committer notes:
      
      Please note that only from LBRv5 (according to Jiri) onwards, i.e. >=
      Skylake is that we'll have the cycles counts in each branch record
      entry, so to see the Cycles and IPC columns, and be able to test this
      patch, one need a capable hardware.
      
      While applying this I first tested it on a Broadwell class machine and
      couldn't get those columns, will add code to the annotate browser to
      warn the user about that, i.e. you have branch records, but no cycles,
      use a more recent hardware to get the cycles and IPC columns.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1517223473-14750-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b40982e8
    • A
      perf report: Fix description for --mem-mode · fc2f5237
      Andi Kleen 提交于
      The "mem-loads" event only works when PEBS is enabled, so add the "/p"
      ("precise") suffix to the examples.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20180209163909.9240-1-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-v0gcd4u9tktrvjjsp6y7ouv4@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fc2f5237
    • R
      coresight: Update documentation for perf usage · 6673016f
      Robert Walker 提交于
      Add notes on using perf to collect and analyze CoreSight trace
      Signed-off-by: NRobert Walker <robert.walker@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6673016f
    • R
      perf inject: Emit instruction records on ETM trace discontinuity · 256e751c
      Robert Walker 提交于
      There may be discontinuities in the ETM trace stream due to overflows or
      ETM configuration for selective trace.  This patch emits an instruction
      sample with the pending branch stack when a TRACE ON packet occurs
      indicating a discontinuity in the trace data.
      
      A new packet type CS_ETM_TRACE_ON is added, which is emitted by the low
      level decoder when a TRACE ON occurs.  The higher level decoder flushes
      the branch stack when this packet is emitted.
      Signed-off-by: NRobert Walker <robert.walker@arm.com>
      Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1518607481-4059-3-git-send-email-robert.walker@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      256e751c
    • R
      perf cs-etm: Inject capabilitity for CoreSight traces · e573e978
      Robert Walker 提交于
      Added user space perf functionality to translate CoreSight traces into
      instruction events with branch stack.
      
      To invoke the new functionality, use the perf inject tool with
      --itrace=il. For example, to translate the ETM trace from perf.data into
      last branch records in a new inj.data file:
      
          $ perf inject --itrace=i100000il128 -i perf.data -o perf.data.new
      
      The 'i' parameter to itrace generates periodic instruction events.  The
      period between instruction events can be specified as a number of
      instructions suffixed by i (default 100000).
      
      The parameter to 'l' specifies the number of entries in the branch stack
      attached to instruction events.
      
      The 'b' parameter to itrace generates events on taken branches.
      
      This patch also fixes the contents of the branch events used in perf
      report - previously branch events were generated for each contiguous
      range of instructions executed.  These are fixed to generate branch
      events between the last address of a range ending in an executed branch
      instruction and the start address of the next range.
      
      Based on patches by Sebastian Pop <s.pop@samsung.com> with additional fixes
      and support for specifying the instruction period.
      Originally-by: NSebastian Pop <s.pop@samsung.com>
      Signed-off-by: NRobert Walker <robert.walker@arm.com>
      Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1518607481-4059-2-git-send-email-robert.walker@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e573e978
    • S
      perf mem: Document a missing option · 7e99b197
      Sangwon Hong 提交于
      Add the missing --force option on the man page.
      Signed-off-by: NSangwon Hong <qpakzk@gmail.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/1518381517-30766-2-git-send-email-qpakzk@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7e99b197