1. 04 6月, 2018 9 次提交
  2. 23 5月, 2018 1 次提交
  3. 19 5月, 2018 2 次提交
    • J
      perf annotate: Create hotkey 'c' to show min/max cycles · 3e71fc03
      Jin Yao 提交于
      In the 'perf annotate' view, a new hotkey 'c' is created for showing the
      min/max cycles.
      
      For example, when press 'c', the annotate view is:
      
        Percent│ IPC     Cycle(min/max)
               │
               │
               │                             Disassembly of section .text:
               │
               │                             000000000003aab0 <random@@GLIBC_2.2.5>:
          8.22 │3.92                           sub    $0x18,%rsp
               │3.92                           mov    $0x1,%esi
               │3.92                           xor    %eax,%eax
               │3.92                           cmpl   $0x0,argp_program_version_hook@@G
               │3.92             1(2/1)      ↓ je     20
               │                               lock   cmpxchg %esi,__abort_msg@@GLIBC_P
               │                             ↓ jne    29
               │                             ↓ jmp    43
               │1.10                     20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+
          8.93 │1.10             1(5/1)      ↓ je     43
      
      When press 'c' again, the annotate view is switched back:
      
        Percent│ IPC Cycle
               │
               │
               │                Disassembly of section .text:
               │
               │                000000000003aab0 <random@@GLIBC_2.2.5>:
          8.22 │3.92              sub    $0x18,%rsp
               │3.92              mov    $0x1,%esi
               │3.92              xor    %eax,%eax
               │3.92              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
               │3.92     1      ↓ je     20
               │                  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
               │                ↓ jne    29
               │                ↓ jmp    43
               │1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
          8.93 │1.10     1      ↓ je     43
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com
      [ Rename all maxmin to minmax ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e71fc03
    • J
      perf annotate: Record the min/max cycles · 48659ebf
      Jin Yao 提交于
      Currently perf has a feature to account cycles for LBRs
      
      For example, on skylake:
      
        perf record -b ...
        perf report or perf annotate
      
      And then browsing the annotate browser gives average cycle counts for
      program blocks.
      
      For some analysis it would be useful if we could know not only the
      average cycles but also the min and max cycles.
      
      This patch records the min and max cycles.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1526569118-14217-2-git-send-email-yao.jin@linux.intel.com
      [ Switch from max/min to min/max ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      48659ebf
  4. 11 5月, 2018 1 次提交
    • J
      perf annotate: Display all available events on --stdio · 04d2600a
      Jin Yao 提交于
      When we perform the following command lines:
      
        $ perf record -e "{cycles,branches}" ./div
        $ perf annotate main --stdio
      
      The output shows only the first event, "cycles" and the displaying
      format is not correct.
      
         Percent         |      Source code & Disassembly of div for cycles (44550 samples)
        -----------------------------------------------------------------------------------
                         :
                         :
                         :
                         :            Disassembly of section .text:
                         :
                         :            00000000004004b0 <main>:
                         :            main():
                         :
                         :                    return i;
                         :            }
                         :
                         :            int main(void)
                         :            {
            0.00 :   4004b0:       push   %rbx
                         :                    int i;
                         :                    int flag;
                         :                    volatile double x = 1212121212, y = 121212;
                         :
                         :                    s_randseed = time(0);
            0.00 :   4004b1:       xor    %edi,%edi
                         :                    srand(s_randseed);
            0.00 :   4004b3:       mov    $0x77359400,%ebx
                         :
                         :                    return i;
                         :            }
      
      The issue is that the value of the 'nr_percent' variable is hardcoded to
      1.  This patch fixes it.
      
      With this patch, the output is:
      
         Percent         |      Source code & Disassembly of div for cycles (44550 samples)
        -----------------------------------------------------------------------------------
                         :
                         :
                         :
                         :            Disassembly of section .text:
                         :
                         :            00000000004004b0 <main>:
                         :            main():
                         :
                         :                    return i;
                         :            }
                         :
                         :            int main(void)
                         :            {
            0.00    0.00 :   4004b0:       push   %rbx
                         :                    int i;
                         :                    int flag;
                         :                    volatile double x = 1212121212, y = 121212;
                         :
                         :                    s_randseed = time(0);
            0.00    0.00 :   4004b1:       xor    %edi,%edi
                         :                    srand(s_randseed);
            0.00    0.00 :   4004b3:       mov    $0x77359400,%ebx
                         :
                         :                    return i;
                         :            }
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: f681d593 ("perf annotate: Remove disasm__calc_percent() from disasm_line__print()")
      Link: http://lkml.kernel.org/r/1525881435-4092-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      04d2600a
  5. 13 4月, 2018 1 次提交
    • A
      perf annotate: Allow setting the offset level in .perfconfig · 43c40231
      Arnaldo Carvalho de Melo 提交于
      The default is 1 (jump_target):
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26        nop
          4.61        push   %rbx
         19.33        pushfq
          7.97        pop    %rax
          0.32        nop
          0.06        mov    %rax,%rbx
         14.63        cli
          0.06        nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         49.94        lock   cmpxchg %edx,(%rdi)
          0.16        test   %eax,%eax
                    ↓ jne    2b
          2.66        mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                    → callq  *ffffffffb30eaed0
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      But one can ask for showing offsets for call instructions by setting
      this:
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26        nop
          4.61        push   %rbx
         19.33        pushfq
          7.97        pop    %rax
          0.32        nop
          0.06        mov    %rax,%rbx
         14.63        cli
          0.06        nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         49.94        lock   cmpxchg %edx,(%rdi)
          0.16        test   %eax,%eax
                    ↓ jne    2b
          2.66        mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                2d: → callq  *ffffffffb30eaed0
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      Or using a big value to ask for all offsets to be shown:
      
        # cat ~/.perfconfig
        [annotate]
      
      	offset_level = 100
      
      	hide_src_code = true
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26   0:   nop
          4.61   5:   push   %rbx
         19.33   6:   pushfq
          7.97   7:   pop    %rax
          0.32   8:   nop
          0.06   d:   mov    %rax,%rbx
         14.63  10:   cli
          0.06  11:   nop
                17:   xor    %eax,%eax
                19:   mov    $0x1,%edx
         49.94  1e:   lock   cmpxchg %edx,(%rdi)
          0.16  22:   test   %eax,%eax
                24: ↓ jne    2b
          2.66  26:   mov    %rbx,%rax
                29:   pop    %rbx
                2a: ← retq
                2b:   mov    %eax,%esi
                2d: → callq  *ffffffffb30eaed0
                32:   mov    %rbx,%rax
                35:   pop    %rbx
                36: ← retq
         #
      
      This also affects the TUI, i.e. the default 'perf annotate' and 'perf
      top/report' -> A hotkey -> annotate interfaces, when slang-devel is present
      in the build, i.e.:
      
        # perf version --build-options | grep slang
                    libslang: [ on  ]  # HAVE_SLANG_SUPPORT
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmqxz6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      43c40231
  6. 12 4月, 2018 1 次提交
    • A
      perf annotate: Allow showing offsets in more than just jump targets · 592c10e2
      Arnaldo Carvalho de Melo 提交于
      Jesper wanted to see offsets at callq sites when doing some performance
      investigation related to retpolines, so save him some time by providing
      an 'struct annotation_options' to control where offsets should appear:
      just on jump targets? That + call instructions? All?
      
      This puts in place the logic to show the offsets, now we need to wire
      this up in the TUI browser (next patch) and on the 'perf annotate --stdio2"
      interface, where we need a more general mechanism to setup the
      'annotation_options' struct from the command line.
      Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-m3jc9c3swobye9tj08gnh5i7@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      592c10e2
  7. 05 4月, 2018 1 次提交
    • A
      perf annotate: Show group details on the title line · c0459a09
      Arnaldo Carvalho de Melo 提交于
      To match what is shown in the main 'perf report/top' title lines, i.e.
      if a group is being shown, either a real group (recorded with "-e
      '{a,b,c}') or a forced group (using 'perf report --group' for a
      perf.data file recorded without {}) we will show multiple columns,
      one per event, but we were failing to show the group details, so, for:
      
       # perf report --header-only | grep cmdline
       # cmdline : /home/acme/bin/perf record -e {cycles,instructions,cache-misses}
       # perf report --group
      
      The first line was showing just "cycles", now it shows the correct line,
      which is:
      
        Samples: 578  of events 'anon group { cycles, instructions, cache-misses }', 4000 Hz, Event count (approx.): 487421794
        syscall_return_via_sysret  /lib/modules/4.16.0-rc7/build/vmlinux
          0.22   2.97   0.00 │    ↓ jmp    6c
                             │      mov    %cr3,%rdi
          1.33  10.89   4.00 │    ↓ jmp    62
                             │      mov    %rdi,%rax
      <SNIP>
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 6920e285 ("perf annotate browser: Show extra title line with event information")
      Link: https://lkml.kernel.org/n/tip-i41tqh17c2dabnyzjh99r1oz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c0459a09
  8. 04 4月, 2018 2 次提交
  9. 24 3月, 2018 4 次提交
    • A
      perf annotate: Use absolute addresses to calculate jump target offsets · 980b68ec
      Arnaldo Carvalho de Melo 提交于
      These types of jumps were confusing the annotate browser:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
        Percent│ffffffff81a00020:   swapgs
        <SNIP>
               │ffffffff81a00128: ↓ jae    ffffffff81a00139 <syscall_return_via_sysret+0x53>
        <SNIP>
               │ffffffff81a00155: → jmpq   *0x825d2d(%rip)   # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      I.e. the syscall_return_via_sysret function is actually "inside" the
      entry_SYSCALL_64 function, and the offsets in jumps like these (+0x53)
      are relative to syscall_return_via_sysret, not to syscall_return_via_sysret.
      
      Or this may be some artifact in how the assembler marks the start and
      end of a function and how this ends up in the ELF symtab for vmlinux,
      i.e. syscall_return_via_sysret() isn't "inside" entry_SYSCALL_64, but
      just right after it.
      
      From readelf -sw vmlinux:
      
       80267: ffffffff81a00020   315 NOTYPE  GLOBAL DEFAULT    1 entry_SYSCALL_64
         316: ffffffff81a000e6     0 NOTYPE  LOCAL  DEFAULT    1 syscall_return_via_sysret
      
       0xffffffff81a00020 + 315 > 0xffffffff81a000e6
      
      So instead of looking for offsets after that last '+' sign, calculate
      offsets for jump target addresses that are inside the function being
      disassembled from the absolute address, 0xffffffff81a00139 in this case,
      subtracting from it the objdump address for the start of the function
      being disassembled, entry_SYSCALL_64() in this case.
      
      So, before this patch:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
      Percent│       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↑ jmp    6c
             │       mov    %cr3,%rdi
             │     ↑ jmp    62
             │       mov    %rdi,%rax
             │       and    $0x7ff,%rdi
             │       bt     %rdi,%gs:0x2219a
             │     ↑ jae    53
             │       btr    %rdi,%gs:0x2219a
             │       mov    %rax,%rdi
             │     ↑ jmp    5b
      
      After:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
        0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode
             │       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↓ jmp    132
             │       mov    %cr3,%rdi
             │    ┌──jmp    128
             │    │  mov    %rdi,%rax
             │    │  and    $0x7ff,%rdi
             │    │  bt     %rdi,%gs:0x2219a
             │    │↓ jae    119
             │    │  btr    %rdi,%gs:0x2219a
             │    │  mov    %rax,%rdi
             │    │↓ jmp    121
             │119:│  mov    %rax,%rdi
             │    │  bts    $0x3f,%rdi
             │121:│  or     $0x800,%rdi
             │128:└─→or     $0x1000,%rdi
             │       mov    %rdi,%cr3
             │132:   pop    %rax
             │       pop    %rdi
             │       pop    %rsp
             │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      With those at least navigating to the right destination, an improvement
      for these cases seems to be to be to somehow mark those inner functions,
      which in this case could be:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
             │syscall_return_via_sysret:
             │       pop    %r15
             │       pop    %r14
             │       pop    %r13
             │       pop    %r12
             │       pop    %rbp
             │       pop    %rbx
             │       pop    %rsi
             │       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↓ jmp    132
             │       mov    %cr3,%rdi
             │    ┌──jmp    128
             │    │  mov    %rdi,%rax
             │    │  and    $0x7ff,%rdi
             │    │  bt     %rdi,%gs:0x2219a
             │    │↓ jae    119
             │    │  btr    %rdi,%gs:0x2219a
             │    │  mov    %rax,%rdi
             │    │↓ jmp    121
             │119:│  mov    %rax,%rdi
             │    │  bts    $0x3f,%rdi
             │121:│  or     $0x800,%rdi
             │128:└─→or     $0x1000,%rdi
             │       mov    %rdi,%cr3
             │132:   pop    %rax
             │       pop    %rdi
             │       pop    %rsp
             │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      This all gets much better viewed if one uses 'perf report --ignore-vmlinux'
      forcing the usage of /proc/kcore + /proc/kallsyms, when the above
      actually gets down to:
      
        # perf report --ignore-vmlinux
        ## do '/64', will show the function names containing '64',
        ## navigate to /entry_SYSCALL_64_after_hwframe.annotation,
        ## press 'A' to annotate, then 'P' to print that annotation
        ## to a file
        ## From another xterm (or see on screen, this 'P' thing is for
        ## getting rid of those right side scroll bars/spaces):
        # cat /entry_SYSCALL_64_after_hwframe.annotation
        entry_SYSCALL_64_after_hwframe() /proc/kcore
        Event: cycles:ppp
      
        Percent
                    Disassembly of section load0:
      
                    ffffffff9aa00044 <load0>:
         11.97        push   %rax
          4.85        push   %rdi
                      push   %rsi
          2.59        push   %rdx
          2.27        push   %rcx
          0.32        pushq  $0xffffffffffffffda
          1.29        push   %r8
                      xor    %r8d,%r8d
          1.62        push   %r9
          0.65        xor    %r9d,%r9d
          1.62        push   %r10
                      xor    %r10d,%r10d
          5.50        push   %r11
                      xor    %r11d,%r11d
          3.56        push   %rbx
                      xor    %ebx,%ebx
          4.21        push   %rbp
                      xor    %ebp,%ebp
          2.59        push   %r12
          0.97        xor    %r12d,%r12d
          3.24        push   %r13
                      xor    %r13d,%r13d
          2.27        push   %r14
                      xor    %r14d,%r14d
          4.21        push   %r15
                      xor    %r15d,%r15d
          0.97        mov    %rsp,%rdi
          5.50      → callq  do_syscall_64
         14.56        mov    0x58(%rsp),%rcx
          7.44        mov    0x80(%rsp),%r11
          0.32        cmp    %rcx,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        shl    $0x10,%rcx
          0.32        sar    $0x10,%rcx
          3.24        cmp    %rcx,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          2.27        cmpq   $0x33,0x88(%rsp)
          1.29      → jne    swapgs_restore_regs_and_return_to_usermode
                      mov    0x30(%rsp),%r11
          8.74        cmp    %r11,0x90(%rsp)
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        test   $0x10100,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        cmpq   $0x2b,0xa0(%rsp)
          0.65      → jne    swapgs_restore_regs_and_return_to_usermode
      
      I.e. using kallsyms makes the function start/end be done differently
      than using what is in the vmlinux ELF symtab and actually the hits
      goes to entry_SYSCALL_64_after_hwframe, which is a GLOBAL() after the
      start of entry_SYSCALL_64:
      
        ENTRY(entry_SYSCALL_64)
                UNWIND_HINT_EMPTY
        <SNIP>
                pushq   $__USER_CS                      /* pt_regs->cs */
                pushq   %rcx                            /* pt_regs->ip */
        GLOBAL(entry_SYSCALL_64_after_hwframe)
                pushq   %rax                            /* pt_regs->orig_ax */
      
                PUSH_AND_CLEAR_REGS rax=$-ENOSYS
      
      And it goes and ends at:
      
                cmpq    $__USER_DS, SS(%rsp)            /* SS must match SYSRET */
                jne     swapgs_restore_regs_and_return_to_usermode
      
                /*
                 * We win! This label is here just for ease of understanding
                 * perf profiles. Nothing jumps here.
                 */
        syscall_return_via_sysret:
                /* rcx and r11 are already restored (see code above) */
                UNWIND_HINT_EMPTY
                POP_REGS pop_rdi=0 skip_r11rcx=1
      
      So perhaps some people should really just play with '--ignore-vmlinux'
      to force /proc/kcore + kallsyms.
      
      One idea is to do both, i.e. have a vmlinux annotation and a
      kcore+kallsyms one, when possible, and even show the patched location,
      etc.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-r11knxv8voesav31xokjiuo6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      980b68ec
    • A
      perf annotate: Defer searching for comma in raw line till it is needed · c448234c
      Arnaldo Carvalho de Melo 提交于
      That strchr() in jump__scnprintf() needs to be nuked somehow, as it,
      IIRC is already done in jump__parse() and if needed at scnprintf() time,
      should be stashed in the struct filled in parse() time.
      
      For now jus defer it to just before where it is used.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-j0t5hagnphoz9xw07bh3ha3g@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c448234c
    • A
      perf annotate: Support jumping from one function to another · e4cc91b8
      Arnaldo Carvalho de Melo 提交于
      For instance:
      
        entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
          5.50 │     → callq  do_syscall_64
         14.56 │       mov    0x58(%rsp),%rcx
          7.44 │       mov    0x80(%rsp),%r11
          0.32 │       cmp    %rcx,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       shl    $0x10,%rcx
          0.32 │       sar    $0x10,%rcx
          3.24 │       cmp    %rcx,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          2.27 │       cmpq   $0x33,0x88(%rsp)
          1.29 │     → jne    swapgs_restore_regs_and_return_to_usermode
               │       mov    0x30(%rsp),%r11
          8.74 │       cmp    %r11,0x90(%rsp)
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       test   $0x10100,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       cmpq   $0x2b,0xa0(%rsp)
          0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode
      
      It'll behave just like a "call" instruction, i.e. press enter or right
      arrow over one such line and the browser will navigate to the annotated
      disassembly of that function, which when exited, via left arrow or esc,
      will come back to the calling function.
      
      Now to support jump to an offset on a different function...
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e4cc91b8
    • A
      perf annotate: Add "_local" to jump/offset validation routines · 2eff0611
      Arnaldo Carvalho de Melo 提交于
      Because they all really check if we can access data structures/visual
      constructs where a "jump" instruction targets code in the same function,
      i.e. things like:
      
        __pthread_mutex_lock  /usr/lib64/libpthread-2.26.so
        1.95 │       mov    __pthread_force_elision,%ecx
             │    ┌──test   %ecx,%ecx
        0.07 │    ├──je     60
             │    │  test   $0x300,%esi
             │    │↓ jne    60
             │    │  or     $0x100,%esi
             │    │  mov    %esi,0x10(%rdi)
             │ 42:│  mov    %esi,%edx
             │    │  lea    0x16(%r8),%rsi
             │    │  mov    %r8,%rdi
             │    │  and    $0x80,%edx
             │    │  add    $0x8,%rsp
             │    │→ jmpq   __lll_lock_elision
             │    │  nop
        0.29 │ 60:└─→and    $0x80,%esi
        0.07 │       mov    $0x1,%edi
        0.29 │       xor    %eax,%eax
        2.53 │       lock   cmpxchg %edi,(%r8)
      
      And not things like that "jmpq __lll_lock_elision", that instead should behave
      like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2eff0611
  10. 22 3月, 2018 2 次提交
  11. 21 3月, 2018 16 次提交