1. 23 2月, 2019 16 次提交
  2. 22 2月, 2019 2 次提交
    • A
      perf thread-stack: Improve thread_stack__no_call_return() · 1f35cd65
      Adrian Hunter 提交于
      Improve thread_stack__no_call_return() to better handle 'returns' that
      do not match the stack i.e. 'no call'. See code comments for details.
      The example below shows how retpolines are affected:
      
      Example:
      
        $ cat simple-retpoline.c
        __attribute__((noinline)) int bar(void)
        {
                return -1;
        }
      
        int foo(void)
        {
                return bar() + 1;
        }
      
        __attribute__((indirect_branch("thunk"))) int main()
        {
                int (*volatile fn)(void) = foo;
      
                fn();
                return fn();
        }
        $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
        $ objdump -d simple-retpoline
        <SNIP>
        0000000000001040 <main>:
            1040:       48 83 ec 18             sub    $0x18,%rsp
            1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 <foo>
            104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
            1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            1055:       e8 1f 01 00 00          callq  1179 <__x86_indirect_thunk_rax>
            105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            105f:       48 83 c4 18             add    $0x18,%rsp
            1063:       e9 11 01 00 00          jmpq   1179 <__x86_indirect_thunk_rax>
        <SNIP>
        0000000000001160 <bar>:
            1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
            1165:       c3                      retq
        <SNIP>
        0000000000001170 <foo>:
            1170:       e8 eb ff ff ff          callq  1160 <bar>
            1175:       83 c0 01                add    $0x1,%eax
            1178:       c3                      retq
        0000000000001179 <__x86_indirect_thunk_rax>:
            1179:       e8 07 00 00 00          callq  1185 <__x86_indirect_thunk_rax+0xc>
            117e:       f3 90                   pause
            1180:       0f ae e8                lfence
            1183:       eb f9                   jmp    117e <__x86_indirect_thunk_rax+0x5>
            1185:       48 89 04 24             mov    %rax,(%rsp)
            1189:       c3                      retq
        <SNIP>
        $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
        $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
        2019-01-08 14:03:37.851655 Creating database...
        2019-01-08 14:03:37.863256 Writing records...
        2019-01-08 14:03:38.069750 Adding indexes
        2019-01-08 14:03:38.078799 Done
        $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
      
      Before:
      
          main
              -> __x86_indirect_thunk_rax
                  -> __x86_indirect_thunk_rax
                      -> __x86_indirect_thunk_rax
                          -> bar
      
      After:
      
          main
              -> __x86_indirect_thunk_rax
                  -> __x86_indirect_thunk_rax
                      -> foo
                          -> bar
      
      Committer testing:
      
      Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
      expanding:
      
      Before:
      
      simple-retpolin
         PID:PID
            _start
               _start
                  __libc_start_main
                     main
                         __x86_indirect_thunk_rax
                            __x86_indirect_thunk_rax
                            bar
      
      After:
      
      Remove the "simple.retpoline.db" file, run again the 'perf script' line
      to regenerate the .db file and run the exported-sql-viewer.py again to
      get the same all the way to 'main', then, from there, including 'main':
      
                     main
                         __x86_indirect_thunk_rax
                             __x86_indirect_thunk_rax
                                 foo
                                     bar
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1f35cd65
    • W
      perf annotate: Fix getting source line failure · 11db1ad4
      Wei Li 提交于
      The output of "perf annotate -l --stdio xxx" changed since commit 425859ff
      ("perf annotate: No need to calculate notes->start twice") removed notes->start
      assignment in symbol__calc_lines(). It will get failed in
      find_address_in_section() from symbol__tty_annotate() subroutine as the
      a2l->addr is wrong. So the annotate summary doesn't report the line number of
      source code correctly.
      
      Before fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
        void hotspot_1(void)
        {
      	volatile int i;
      
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
        }
      
        int main(void)
        {
      	hotspot_1();
      
      	return 0;
        }
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         19.30 common_while_1[32]
         19.03 common_while_1[4e]
         19.01 common_while_1[16]
          5.04 common_while_1[13]
          4.99 common_while_1[4b]
          4.78 common_while_1[2c]
          4.77 common_while_1[10]
          4.66 common_while_1[2f]
          4.59 common_while_1[51]
          4.59 common_while_1[35]
          4.52 common_while_1[19]
          4.20 common_while_1[56]
          0.51 common_while_1[48]
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1[10]    4.77 :   60a:   add    $0x1,%eax
         common_while_1[13]    5.04 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1[16]   19.01 :   610:   mov    -0x4(%rbp),%eax
         common_while_1[19]    4.52 :   613:   cmp    $0xfffffff,%eax
            0.00 :   618:   jle    607 <hotspot_1+0xd>
                 :                 for (i = 0; i < 0x10000000; i++);
        ...
      
      After fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         33.34 common_while_1.c:5
         33.34 common_while_1.c:6
         33.32 common_while_1.c:7
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.70 :   60a:   add    $0x1,%eax
          4.89 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1.c:5   19.03 :   610:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.72 :   613:   cmp    $0xfffffff,%eax
          0.00 :   618:   jle    607 <hotspot_1+0xd>
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   61a:   movl   $0x0,-0x4(%rbp)
          0.00 :   621:   jmp    62c <hotspot_1+0x32>
          0.00 :   623:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   626:   add    $0x1,%eax
          4.73 :   629:   mov    %eax,-0x4(%rbp)
         common_while_1.c:6   19.54 :   62c:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   62f:   cmp    $0xfffffff,%eax
        ...
      Signed-off-by: NWei Li <liwei391@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 425859ff ("perf annotate: No need to calculate notes->start twice")
      Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      11db1ad4
  3. 21 2月, 2019 6 次提交
  4. 20 2月, 2019 5 次提交
    • A
      perf trace: Allow dumping a BPF map after setting up BPF events · ff7a4f98
      Arnaldo Carvalho de Melo 提交于
      Initial use case:
      
      Dumping the maps setup by tools/perf/examples/bpf/augmented_raw_syscalls.c,
      which so far are just booleans, showing just non-zeroed entries:
      
        # cat ~/.perfconfig
        [llvm]
      	dump-obj = true
      	clang-opt = -g
        [trace]
      	#add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
      	add_events = /wb/augmented_raw_syscalls.o
        $ date
        Tue Feb 19 16:29:33 -03 2019
        $ ls -la /wb/augmented_raw_syscalls.o
        -rwxr-xr-x. 1 root root 14048 Jan 24 12:09 /wb/augmented_raw_syscalls.o
        $ file /wb/augmented_raw_syscalls.o
        /wb/augmented_raw_syscalls.o: ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), with debug_info, not stripped
        $
        # trace -e recvmmsg,sendmmsg --map-dump foobar
        ERROR: BPF map "foobar" not found
        # trace -e recvmmsg,sendmmsg --map-dump filtered_pids
        ERROR: BPF map "filtered_pids" not found
        # trace -e recvmmsg,sendmmsg --map-dump pids_filtered
        [2583] = 1,
        [2267] = 1,
        ^Z
        [1]+  Stopped                 trace -e recvmmsg,sendmmsg --map-dump pids_filtered
        # pidof trace
        2267
        # ps ax|grep gnome-terminal|grep -v grep
        2583 ?        Ssl   58:33 /usr/libexec/gnome-terminal-server
        ^C
        # trace -e recvmmsg,sendmmsg --map-dump syscalls
        [299] = 1,
        [307] = 1,
        ^C
        # grep x64_recvmmsg arch/x86/entry/syscalls/syscall_64.tbl
        299	64	recvmmsg		__x64_sys_recvmmsg
        # grep x64_sendmmsg arch/x86/entry/syscalls/syscall_64.tbl
        307	64	sendmmsg		__x64_sys_sendmmsg
        #
      
      Next step probably will be something like 'perf stat's --interval-print and
      --interval-clear.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lkml.kernel.org/n/tip-ztxj25rtx37ixo9cfajt8ocy@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff7a4f98
    • A
      perf bpf: Add bpf_map dumper · d19f8564
      Arnaldo Carvalho de Melo 提交于
      At some point I'll suggest moving this to libbpf, for now I'll
      experiment with ways to dump BPF maps set by events in 'perf trace',
      starting with a very basic dumper for the current very limited needs
      of the augmented_raw_syscalls code: dumping booleans.
      
      Having functions that apply to the map keys and values and do table
      lookup in things like syscall id to string tables should come next.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lkml.kernel.org/n/tip-lz14w0esqyt1333aon05jpwc@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d19f8564
    • T
      perf test: Fix failure of 'evsel-tp-sched' test on s390 · 03d30971
      Thomas Richter 提交于
      Commit 489338a7 ("perf tests evsel-tp-sched: Fix bitwise operator")
      causes test case 14 "Parse sched tracepoints fields" to fail on s390.
      
      This test succeeds on x86.
      
      In fact this test now fails on all architectures with type char treated
      as type unsigned char.
      
      The root cause is the signed-ness of character arrays in the tracepoints
      sched_switch for structure members prev_comm and next_comm.
      
      On s390 the output of:
      
       [root@m35lp76 perf]# cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
       name: sched_switch
       ID: 287
       format:
         field:unsigned short common_type; offset:0; size:2;	signed:0;
         ...
         field:char prev_comm[16]; offset:8; size:16;	signed:0;
         ...
         field:char next_comm[16]; offset:40; size:16; signed:0;
      
      reveals the character arrays prev_comm and next_comm are per
      default unsigned char and have values in the range of 0..255.
      
      On x86 both fields are signed as this output shows:
       [root@f29]# cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
       name: sched_switch
       ID: 287
       format:
         field:unsigned short common_type; offset:0; size:2;	signed:0;
         ...
         field:char prev_comm[16]; offset:8; size:16;	signed:1;
         ...
         field:char next_comm[16]; offset:40; size:16; signed:1;
      
      and the character arrays prev_comm and next_comm are per default signed
      char and have values in the range of -1..127.  The implementation of
      type char is architecture specific.
      
      Since the character arrays in both tracepoints sched_switch and
      sched_wakeup should contain ascii characters, simply omit the check for
      signedness in the test case.
      
      Output before:
      
        [root@m35lp76 perf]# ./perf test -F 14
        14: Parse sched tracepoints fields                        :
        --- start ---
        sched:sched_switch: "prev_comm" signedness(0) is wrong, should be 1
        sched:sched_switch: "next_comm" signedness(0) is wrong, should be 1
        sched:sched_wakeup: "comm" signedness(0) is wrong, should be 1
        ---- end ----
        14: Parse sched tracepoints fields                        : FAILED!
        [root@m35lp76 perf]#
      
      Output after:
      
        [root@m35lp76 perf]# ./perf test -Fv 14
        14: Parse sched tracepoints fields                        :
        --- start ---
        ---- end ----
        Parse sched tracepoints fields: Ok
        [root@m35lp76 perf]#
      
      Fixes: 489338a7 ("perf tests evsel-tp-sched: Fix bitwise operator")
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20190219153639.31267-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      03d30971
    • J
      perf doc: Fix documentation of the Flags section in perf.data · 8c23a522
      Jonas Rabenstein 提交于
      According to the current documentation the flags section is placed after
      the file header itself but the code assumes to find the flags section
      after the data section. This change updates the documentation to that
      assumption.
      Signed-off-by: NJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20190219154515.3954-2-jonas.rabenstein@studium.uni-erlangen.deSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8c23a522
    • J
      perf doc: Fix HEADER_CMDLINE description in perf.data documentation · 7a663c0f
      Jonas Rabenstein 提交于
      The content of the HEADER_CMDLINE feature header is a perf_header_string_list
      of the argument vector and not a perf_header_string of the commandline.
      Signed-off-by: NJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20190219154515.3954-1-jonas.rabenstein@studium.uni-erlangen.deSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a663c0f
  5. 19 2月, 2019 5 次提交
  6. 16 2月, 2019 1 次提交
  7. 15 2月, 2019 5 次提交