1. 25 1月, 2019 1 次提交
  2. 18 12月, 2018 1 次提交
    • J
      perf annotate: Compute average IPC and IPC coverage per symbol · ace4f8fa
      Jin Yao 提交于
      Add support to 'perf report' annotate view or 'perf annotate --stdio2'
      to aggregate the IPC derived from timed LBRs per symbol. We compute the
      average IPC and the IPC coverage percentage.
      
      For example:
      
        $ perf annotate --stdio2
      
        Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)
      
                                Disassembly of section .text:
      
                                000000000003aac0 <random@@GLIBC_2.2.5>:
          8.32  3.28              sub    $0x18,%rsp
                3.28              mov    $0x1,%esi
                3.28              xor    %eax,%eax
                3.28              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
         11.57  3.28     1      ↓ je     20
                                  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
                                ↓ jne    29
                                ↓ jmp    43
         11.57  1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
          0.00  1.10     1      ↓ je     43
                            29:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                                  sub    $0x80,%rsp
                                → callq  __lll_lock_wait_private
                                  add    $0x80,%rsp
          0.00  3.00        43:   lea    __ctype_b@GLIBC_2.2.5+0x38,%rdi
                3.00              lea    0xc(%rsp),%rsi
          8.49  3.00     1      → callq  __random_r
          7.91  1.94              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
          0.00  1.94     1      ↓ je     68
                                  lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
                                ↓ jne    70
                                ↓ jmp    8a
          0.00  2.00        68:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
         21.56  2.00     1      ↓ je     8a
                            70:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                                  sub    $0x80,%rsp
                                → callq  __lll_unlock_wake_private
                                  add    $0x80,%rsp
         21.56  2.90        8a:   movslq 0xc(%rsp),%rax
                2.90              add    $0x18,%rsp
          9.03  2.90     1      ← retq
      
      It shows for this symbol the average IPC is 2.30 and the IPC coverage is
      54.8%.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1543586097-27632-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ace4f8fa
  3. 31 8月, 2018 1 次提交
    • K
      perf annotate: Fix parsing aarch64 branch instructions after objdump update · 4e67b2a5
      Kim Phillips 提交于
      Starting with binutils 2.28, aarch64 objdump adds comments to the
      disassembly output to show the alternative names of a condition code
      [1].
      
      It is assumed that commas in objdump comments could occur in other
      arches now or in the future, so this fix is arch-independent.
      
      The fix could have been done with arm64 specific jump__parse and
      jump__scnprintf functions, but the jump__scnprintf instruction would
      have to have its comment character be a literal, since the scnprintf
      functions cannot receive a struct arch easily.
      
      This inconvenience also applies to the generic jump__scnprintf, which is
      why we add a raw_comment pointer to struct ins_operands, so the __parse
      function assigns it to be re-used by its corresponding __scnprintf
      function.
      
      Example differences in 'perf annotate --stdio2' output on an aarch64
      perf.data file:
      
      BEFORE: → b.cs   ffff200008133d1c <unwind_frame+0x18c>  // b.hs, dffff7ecc47b
      AFTER : ↓ b.cs   18c
      
      BEFORE: → b.cc   ffff200008d8d9cc <get_alloc_profile+0x31c>  // b.lo, b.ul, dffff727295b
      AFTER : ↓ b.cc   31c
      
      The branch target labels 18c and 31c also now appear in the output:
      
      BEFORE:        add    x26, x29, #0x80
      AFTER : 18c:   add    x26, x29, #0x80
      
      BEFORE:        add    x21, x21, #0x8
      AFTER : 31c:   add    x21, x21, #0x8
      
      The Fixes: tag below is added so stable branches will get the update; it
      doesn't necessarily mean that commit was broken at the time, rather it
      didn't withstand the aarch64 objdump update.
      
      Tested no difference in output for sample x86_64, power arch perf.data files.
      
      [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=bb7eff5206e4795ac79c177a80fe9f4630aaf730Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Fixes: b13bbeee ("perf annotate: Fix branch instruction with multiple operands")
      Link: http://lkml.kernel.org/r/20180827125340.a2f7e291901d17cea05daba4@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4e67b2a5
  4. 09 8月, 2018 12 次提交
  5. 04 6月, 2018 11 次提交
  6. 19 5月, 2018 2 次提交
    • J
      perf annotate: Create hotkey 'c' to show min/max cycles · 3e71fc03
      Jin Yao 提交于
      In the 'perf annotate' view, a new hotkey 'c' is created for showing the
      min/max cycles.
      
      For example, when press 'c', the annotate view is:
      
        Percent│ IPC     Cycle(min/max)
               │
               │
               │                             Disassembly of section .text:
               │
               │                             000000000003aab0 <random@@GLIBC_2.2.5>:
          8.22 │3.92                           sub    $0x18,%rsp
               │3.92                           mov    $0x1,%esi
               │3.92                           xor    %eax,%eax
               │3.92                           cmpl   $0x0,argp_program_version_hook@@G
               │3.92             1(2/1)      ↓ je     20
               │                               lock   cmpxchg %esi,__abort_msg@@GLIBC_P
               │                             ↓ jne    29
               │                             ↓ jmp    43
               │1.10                     20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+
          8.93 │1.10             1(5/1)      ↓ je     43
      
      When press 'c' again, the annotate view is switched back:
      
        Percent│ IPC Cycle
               │
               │
               │                Disassembly of section .text:
               │
               │                000000000003aab0 <random@@GLIBC_2.2.5>:
          8.22 │3.92              sub    $0x18,%rsp
               │3.92              mov    $0x1,%esi
               │3.92              xor    %eax,%eax
               │3.92              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
               │3.92     1      ↓ je     20
               │                  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
               │                ↓ jne    29
               │                ↓ jmp    43
               │1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
          8.93 │1.10     1      ↓ je     43
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com
      [ Rename all maxmin to minmax ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e71fc03
    • J
      perf annotate: Record the min/max cycles · 48659ebf
      Jin Yao 提交于
      Currently perf has a feature to account cycles for LBRs
      
      For example, on skylake:
      
        perf record -b ...
        perf report or perf annotate
      
      And then browsing the annotate browser gives average cycle counts for
      program blocks.
      
      For some analysis it would be useful if we could know not only the
      average cycles but also the min and max cycles.
      
      This patch records the min and max cycles.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1526569118-14217-2-git-send-email-yao.jin@linux.intel.com
      [ Switch from max/min to min/max ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      48659ebf
  7. 12 4月, 2018 1 次提交
    • A
      perf annotate: Allow showing offsets in more than just jump targets · 592c10e2
      Arnaldo Carvalho de Melo 提交于
      Jesper wanted to see offsets at callq sites when doing some performance
      investigation related to retpolines, so save him some time by providing
      an 'struct annotation_options' to control where offsets should appear:
      just on jump targets? That + call instructions? All?
      
      This puts in place the logic to show the offsets, now we need to wire
      this up in the TUI browser (next patch) and on the 'perf annotate --stdio2"
      interface, where we need a more general mechanism to setup the
      'annotation_options' struct from the command line.
      Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-m3jc9c3swobye9tj08gnh5i7@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      592c10e2
  8. 04 4月, 2018 1 次提交
  9. 24 3月, 2018 1 次提交
    • A
      perf annotate: Add "_local" to jump/offset validation routines · 2eff0611
      Arnaldo Carvalho de Melo 提交于
      Because they all really check if we can access data structures/visual
      constructs where a "jump" instruction targets code in the same function,
      i.e. things like:
      
        __pthread_mutex_lock  /usr/lib64/libpthread-2.26.so
        1.95 │       mov    __pthread_force_elision,%ecx
             │    ┌──test   %ecx,%ecx
        0.07 │    ├──je     60
             │    │  test   $0x300,%esi
             │    │↓ jne    60
             │    │  or     $0x100,%esi
             │    │  mov    %esi,0x10(%rdi)
             │ 42:│  mov    %esi,%edx
             │    │  lea    0x16(%r8),%rsi
             │    │  mov    %r8,%rdi
             │    │  and    $0x80,%edx
             │    │  add    $0x8,%rsp
             │    │→ jmpq   __lll_lock_elision
             │    │  nop
        0.29 │ 60:└─→and    $0x80,%esi
        0.07 │       mov    $0x1,%edi
        0.29 │       xor    %eax,%eax
        2.53 │       lock   cmpxchg %edi,(%r8)
      
      And not things like that "jmpq __lll_lock_elision", that instead should behave
      like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2eff0611
  10. 22 3月, 2018 2 次提交
  11. 21 3月, 2018 7 次提交