1. 26 7月, 2017 1 次提交
  2. 25 7月, 2017 1 次提交
  3. 21 7月, 2017 4 次提交
  4. 19 7月, 2017 3 次提交
    • K
      perf buildid-cache: Cache debuginfo · d2396999
      Krister Johansen 提交于
      If a stripped binary is placed in the cache, the user is in a situation
      where there's a cached elf file present, but it doesn't have any symtab
      to use for name resolution.  Grab the debuginfo for binaries that don't
      end in .ko.  This yields a better chance of resolving symbols from older
      traces.
      Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1499305693-1599-7-git-send-email-kjlx@templeofstupid.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2396999
    • J
      perf annotate: Implement visual marker for macro fusion · 7e63a13a
      Jin Yao 提交于
      For marking fused instructions clearly this patch adds a line before the
      first instruction of pair and joins it with the arrow of the jump to its
      target.
      
      For example, when "je" is selected in annotate view, the line before
      cmpl is displayed and joins the arrow of "je".
      
             │   ┌──cmpl   $0x0,argp_program_version_hook
       81.93 │   ├──je     20
             │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
             │   │↓ jne    29
             │   │↓ jmp    43
       11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
      
      That means the cmpl+je is a fused instruction pair and they should be
      considered together.
      
      Changelog:
      
      v3: Use Arnaldo's fix to improve the arrow origin rendering.  To get the
          evsel->evlist->env->cpuid, save the evsel in annotate_browser.
      
      v2: new function "ins__is_fused" to check if the instructions are fused.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7e63a13a
    • J
      perf annotate: Check for fused instructions · 69fb09f6
      Jin Yao 提交于
      Macro fusion merges two instructions to a single micro-op. Intel core
      platform performs this hardware optimization under limited
      circumstances.
      
      For example, CMP + JCC can be "fused" and executed /retired together.
      While with sampling this can result in the sample sometimes being on the
      JCC and sometimes on the CMP.  So for the fused instruction pair, they
      could be considered together.
      
      On Nehalem, fused instruction pairs:
      
        cmp/test + jcc.
      
      On other new CPU:
      
        cmp/test/add/sub/and/inc/dec + jcc.
      
      This patch adds an x86-specific function which checks if 2 instructions
      are in a "fused" pair. For non-x86 arch, the function is just NULL.
      
      Changelog:
      
      v4: Move the CPU model checking to symbol__disassemble and save the CPU
          family/model in arch structure.
      
          It avoids checking every time when jump arrow printed.
      
      v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs
          just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC).
      
      v2: Remove the original weak function. Arnaldo points out that doing it
          as a weak function that will be overridden by the host arch doesn't
          work. So now it's implemented as an arch-specific function.
      
      Committer fix:
      
      Do not access evsel->evlist->env->cpuid, ->env can be null, introduce
      perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in
      this function call.
      
      The original patch was segfaulting 'perf top' + annotation.
      
      But this essentially disables this fused instructions augmentation in
      'perf top', the right thing is to get the cpuid from the running kernel,
      left for a later patch tho.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69fb09f6
  5. 20 6月, 2017 1 次提交
  6. 09 6月, 2017 2 次提交
  7. 02 6月, 2017 1 次提交
    • K
      perf annotate: Fix branch instruction with multiple operands · b13bbeee
      Kim Phillips 提交于
      'perf annotate' is dropping the cr* fields from branch instructions.
      
      Fix it by adding support to display branch instructions having
      multiple operands.
      
      Power Arch objdump of int_sqrt:
      
       20.36 | c0000000004d2694:   subf   r10,r10,r3
             | c0000000004d2698: v bgt    cr6,c0000000004d26a0 <int_sqrt+0x40>
        1.82 | c0000000004d269c:   mr     r3,r10
       29.18 | c0000000004d26a0:   mr     r10,r8
             | c0000000004d26a4: v bgt    cr7,c0000000004d26ac <int_sqrt+0x4c>
             | c0000000004d26a8:   mr     r10,r7
      
      Power Arch Before Patch:
      
       20.36 |       subf   r10,r10,r3
             |     v bgt    40
        1.82 |       mr     r3,r10
       29.18 | 40:   mr     r10,r8
             |     v bgt    4c
             |       mr     r10,r7
      
      Power Arch After patch:
      
       20.36 |       subf   r10,r10,r3
             |     v bgt    cr6,40
        1.82 |       mr     r3,r10
       29.18 | 40:   mr     r10,r8
             |     v bgt    cr7,4c
             |       mr     r10,r7
      
      Also support AArch64 conditional branch instructions, which can
      have up to three operands:
      
      Aarch64 Non-simplified (raw objdump) view:
      
             │ffff0000083cd11c: ↑ cbz    w0, ffff0000083cd100 <security_fil▒
      ...
        4.44 │ffff000│083cd134: ↓ tbnz   w0, #26, ffff0000083cd190 <securit▒
      ...
        1.37 │ffff000│083cd144: ↓ tbnz   w22, #5, ffff0000083cd1a4 <securit▒
             │ffff000│083cd148:   mov    w19, #0x20000                   //▒
        1.02 │ffff000│083cd14c: ↓ tbz    w22, #2, ffff0000083cd1ac <securit▒
      ...
        0.68 │ffff000└──3cd16c: ↑ cbnz   w0, ffff0000083cd120 <security_fil▒
      
      Aarch64 Simplified, before this patch:
      
             │    ↑ cbz    40
      ...
        4.44 │   │↓ tbnz   w0, #26, ffff0000083cd190 <security_file_permiss▒
      ...
        1.37 │   │↓ tbnz   w22, #5, ffff0000083cd1a4 <security_file_permiss▒
             │   │  mov    w19, #0x20000                   // #131072
        1.02 │   │↓ tbz    w22, #2, ffff0000083cd1ac <security_file_permiss▒
      ...
        0.68 │   └──cbnz   60
      
      the cbz operand is missing, and the tbz doesn't get simplified processing
      at all because the parsing function failed to match an address.
      
      Aarch64 Simplified, After this patch applied:
      
             │    ↑ cbz    w0, 40
      ...
        4.44 │   │↓ tbnz   w0, #26, d0
      ...
        1.37 │   │↓ tbnz   w22, #5, e4
             │   │  mov    w19, #0x20000                   // #131072
        1.02 │   │↓ tbz    w22, #2, ec
      ...
        0.68 │   └──cbnz   w0, 60
      Originally-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Reported-by: NAnton Blanchard <anton@samba.org>
      Reported-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/20170601092959.f60d98912e8a1b66fd1e4c0e@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b13bbeee
  8. 27 5月, 2017 1 次提交
  9. 20 4月, 2017 5 次提交
  10. 12 4月, 2017 4 次提交
  11. 07 4月, 2017 1 次提交
  12. 05 4月, 2017 1 次提交
    • T
      perf annotate: Fix missing number of samples for source_line_samples · 99094a5e
      Taeung Song 提交于
      The option 'show-total-period' works fine without a option '-l'.  But if
      running 'perf annotate --stdio -l --show-total-period', you can see a
      problem showing only zero '0' for number of samples.
      
      Before:
          $ perf annotate --stdio -l --show-total-period
      ...
             0 :        400816:       push   %rbp
             0 :        400817:       mov    %rsp,%rbp
             0 :        40081a:       mov    %edi,-0x24(%rbp)
             0 :        40081d:       mov    %rsi,-0x30(%rbp)
             0 :        400821:       mov    -0x24(%rbp),%eax
             0 :        400824:       mov    -0x30(%rbp),%rdx
             0 :        400828:       mov    (%rdx),%esi
             0 :        40082a:       mov    $0x0,%edx
      ...
      
      The reason is it was missed to set number of samples of
      source_line_samples, so set it ordinarily.
      
      After:
          $ perf annotate --stdio -l --show-total-period
      ...
             3 :        400816:       push   %rbp
             4 :        400817:       mov    %rsp,%rbp
             0 :        40081a:       mov    %edi,-0x24(%rbp)
             0 :        40081d:       mov    %rsi,-0x30(%rbp)
             1 :        400821:       mov    -0x24(%rbp),%eax
             2 :        400824:       mov    -0x30(%rbp),%rdx
             0 :        400828:       mov    (%rdx),%esi
             1 :        40082a:       mov    $0x0,%edx
      ...
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 0c4a5bce ("perf annotate: Display total number of samples with --show-total-period")
      Link: http://lkml.kernel.org/r/1490703125-13643-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99094a5e
  13. 28 3月, 2017 2 次提交
    • T
      perf annotate: Fix a bug of division by zero when calculating percent · 2e933b12
      Taeung Song 提交于
      Currently perf-annotate with --print-line can print
      -nan(0x8000000000000) because of division by zero when calculating
      percent. The division by zero happens when a sum of samples is zero in
      symbol__get_source_line(), so fix it.
      
      For example:
      
      After running 'perf record' like below,
      
          $ perf record -e "{cycles,page-faults,branch-misses}" ./a.out
      
      Before:
      
          $ perf annotate --stdio -l
      
        Sorted summary for file /home/taeung/workspace/a.out
        ----------------------------------------------
      
         32.89    -nan    7.04 a.c:38
         25.14    -nan    0.00 a.c:34
         16.26    -nan   56.34 a.c:31
         15.88    -nan    1.41 a.c:37
          5.67    -nan    0.00 a.c:39
          1.13    -nan   35.21 a.c:26
          0.95    -nan    0.00 a.c:44
          0.57    -nan    0.00 a.c:32
         Percent                 |      Source code & Disassembly of a.out for cycles (529 samples)
        -----------------------------------------------------------------------------------------
                               :
        ...
      
         a.c:26    0.57    -nan    4.23 :         40081a:       mov    %edi,-0x24(%rbp)
         a.c:26    0.00    -nan    9.86 :         40081d:       mov    %rsi,-0x30(%rbp)
      
        ...
      
      However, if a sum of samples is zero (e.g. 'page-faults'),
      skip calculating percent.
      
      After:
      
          $ perf annotate --stdio -l
      
        Sorted summary for file /home/taeung/workspace/a.out
        ----------------------------------------------
      
         32.89    0.00    7.04 a.c:38
         25.14    0.00    0.00 a.c:34
         16.26    0.00   56.34 a.c:31
         15.88    0.00    1.41 a.c:37
          5.67    0.00    0.00 a.c:39
          1.13    0.00   35.21 a.c:26
          0.95    0.00    0.00 a.c:44
          0.57    0.00    0.00 a.c:32
         Percent                 |      Source code & Disassembly of old for cycles (529 samples)
        -----------------------------------------------------------------------------------------
                               :
        ...
      
        a.c:26    0.57    0.00    4.23 :         40081a:       mov    %edi,-0x24(%rbp)
        a.c:26    0.00    0.00    9.86 :         40081d:       mov    %rsi,-0x30(%rbp)
      
        ...
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1490598638-13947-3-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e933b12
    • T
      perf annotate: Fix a bug following symbolic link of a build-id file · 6ebd2547
      Taeung Song 提交于
      It is wrong way to read link name from a build-id file.  Because a
      build-id file is not anymore a symbolic link but build-id directory of
      it is symbolic link, so fix it.
      
      For example, if build-id file name gotten from
      dso__build_id_filename() is as below,
      
        /root/.debug/.build-id/4f/75c7d197c951659d1c1b8b5fd49bcdf8f3f8b1/elf
      
      To correctly read link name of build-id, use the build-id dir path that
      is a symbolic link, instead of the above build-id file name like below.
      
        /root/.debug/.build-id/4f/75c7d197c951659d1c1b8b5fd49bcdf8f3f8b1
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1490598638-13947-2-git-send-email-treeze.taeung@gmail.com
      Fixes: 01412261 ("perf buildid-cache: Use path/to/bin/buildid/elf instead of path/to/bin/buildid")
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6ebd2547
  14. 27 3月, 2017 1 次提交
    • M
      perf report: Enable sorting by srcline as key · 5dfa210e
      Milian Wolff 提交于
      Often it is interesting to know how costly a given source line is in
      total. Previously, one had to build these sums manually based on all
      addresses that pointed to the same source line. This patch introduces
      srcline as a sort key, which will do the aggregation for us.
      
      Paired with the recent addition of showing inline frames, this makes
      perf report much more useful for many C++ work loads.
      
      The following shows the new feature in action. First, let's show the
      status quo output when we sort by address. The result contains many hist
      entries that generate the same output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g address
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--60.31%--hypot +20
                  |          |          |
                  |          |          |--8.52%--__hypot_finite +273
                  |          |          |
                  |          |          |--7.32%--__hypot_finite +411
      ...
                   --35.34%--_start +4194346
                             __libc_start_main +241
                             |
                             |--6.65%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--2.70%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--1.69%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      
      With this patch and `-g srcline` we instead get the following output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g srcline
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--64.02%--hypot
                  |          |          |
                  |          |           --59.81%--__hypot_finite
                  |          |
                  |           --0.53%--cabs
                  |
                   --35.34%--_start
                             __libc_start_main
                             |
                             |--12.48%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5dfa210e
  15. 22 3月, 2017 2 次提交
    • A
      perf annotate: Add comment clarifying how the source code line is parsed · ed7b339f
      Arnaldo Carvalho de Melo 提交于
      The source code line number (lineno) needs to be kept in accross calls
      to symbol__parse_objdump_line() when parsing the output of 'objdump -l
      -dS', so that it can associate it with the instructions till the next
      line.
      
      See disasm_line__new() and struct disasm_line::line_nr.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-7hpx8f8ybdpiujceysaj229w@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ed7b339f
    • T
      perf annotate: More exactly grep -v of the objdump command · e7cb9de2
      Taeung Song 提交于
      The 'grep -v "filename"' applied to the objdump command output cause a
      side effect eliminating filename:linenr of output of 'objdump -l' if the
      object file name and source file name are the same, fix it.
      
      E.g. the output of the following objdump command in symbol__disassemble():
      
          $ objdump -l -d -S -C /home/taeung/hello --start-address=...
      
          /home/taeung/hello:     file format elf64-x86-64
      
          Disassembly of section .text:
      
          0000000000400526 <main>:
          main():
          /home/taeung/hello.c:4
      
          void main()
          {
            400526:	55                   	push   %rbp
            400527:	48 89 e5             	mov    %rsp,%rbp
          /home/taeung/hello.c:5
          ...
      
      But it uses grep -v "filename" e.g. "/home/taeung/hello" in the objdump
      command to remove the first line containing file name and file format
      ("/home/taeung/hello:     file format elf64-x86-64"):
      
      Before:
      
          $ objdump -l -d -S -C /home/taeung/hello | grep /home/taeung/hello
      
      But this causes a side effect, removing filename:linenr too, because the
      object file and source file have the same name e.g. "/home/taueng/hello",
      "/home/taeung/hello.c"
      
      So more do a better match by using grep -v as below to correctly remove
      that first line:
      
          "/home/taeung/hello:     file format elf64-x86-64"
      
      After:
      
          $ objdump -l -d -S -C /home/taeung/hello | grep /home/taeung/hello:
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1489978617-31396-5-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e7cb9de2
  16. 20 2月, 2017 1 次提交
  17. 20 12月, 2016 1 次提交
  18. 16 12月, 2016 2 次提交
    • R
      perf annotate: Fix jump target outside of function address range · e216874c
      Ravi Bangoria 提交于
      If jump target is outside of function range, perf is not handling it
      correctly. Especially when target address is lesser than function start
      address, target offset will be negative. But, target address declared to
      be unsigned, converts negative number into 2's complement. See below
      example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
      lesser than function start address(34cf0).
      
              34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0
      
      Objdump output:
      
        0000000000034cf0 <__sigaction>:
        __GI___sigaction():
          34cf0: lea    -0x20(%rdi),%eax
          34cf3: cmp    -bashx1,%eax
          34cf6: jbe    34d00 <__sigaction+0x10>
          34cf8: jmpq   34ac0 <__GI___libc_sigaction>
          34cfd: nopl   (%rax)
          34d00: mov    0x386161(%rip),%rax        # 3bae68 <_DYNAMIC+0x2e8>
          34d07: movl   -bashx16,%fs:(%rax)
          34d0e: mov    -bashxffffffff,%eax
          34d13: retq
      
      perf annotate before applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              v  jmpq   fffffffffffffdd0
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      
      perf annotate after applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              ^  jmpq   34ac0 <__GI___libc_sigaction>
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e216874c
    • R
      perf annotate: Support jump instruction with target as second operand · 3ee2eb6d
      Ravi Bangoria 提交于
      Architectures like PowerPC have jump instructions that includes a target
      address as a second operand. For example, 'bne cr7,0xc0000000000f6154'.
      Add support for such instruction in perf annotate.
      
      objdump o/p:
        c0000000000f6140:   ld     r9,1032(r31)
        c0000000000f6144:   cmpdi  cr7,r9,0
        c0000000000f6148:   bne    cr7,0xc0000000000f6154
        c0000000000f614c:   ld     r9,2312(r30)
        c0000000000f6150:   std    r9,1032(r31)
        c0000000000f6154:   ld     r9,88(r31)
      
      Corresponding perf annotate o/p:
      
      Before patch:
               ld     r9,1032(r31)
               cmpdi  cr7,r9,0
            v  bne    3ffffffffff09f2c
               ld     r9,2312(r30)
               std    r9,1032(r31)
        74:    ld     r9,88(r31)
      
      After patch:
               ld     r9,1032(r31)
               cmpdi  cr7,r9,0
            v  bne    74
               ld     r9,2312(r30)
               std    r9,1032(r31)
        74:    ld     r9,88(r31)
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-2-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ee2eb6d
  19. 06 12月, 2016 1 次提交
    • R
      perf annotate: Show raw form for jump instruction with indirect target · bec60e50
      Ravi Bangoria 提交于
      For jump instructions that does not include target address as direct operand,
      show the original disassembled line for them. This is needed for certain
      powerpc jump instructions that use target address in a register (such as bctr,
      btar, ...).
      
      Before:
           ld     r12,32088(r12)
           mtctr  r12
        v  bctr   ffffffffffffca2c
           std    r2,24(r1)
           addis  r12,r2,-1
      
      After:
           ld     r12,32088(r12)
           mtctr  r12
        v  bctr
           std    r2,24(r1)
           addis  r12,r2,-1
      
      Committer notes:
      
      Testing it using a perf.data file and vmlinux for powerpc64,
      cross-annotating it on a x86_64 workstation:
      
      Before:
      
        .__bpf_prog_run  vmlinux.powerpc
               │        std    r10,512(r9)                      ▒
               │        lbz    r9,0(r31)                        ▒
               │        rldicr r9,r9,3,60                       ▒
               │        ldx    r9,r30,r9                        ▒
               │        mtctr  r9                               ▒
        100.00 │      ↓ bctr   3fffffffffe01510                 ▒
               │        lwa    r10,4(r31)                       ▒
               │        lwz    r9,0(r31)                        ▒
        <SNIP>
        Invalid jump offset: 3fffffffffe01510
      
      After:
      
        .__bpf_prog_run  vmlinux.powerpc
               │        std    r10,512(r9)                      ▒
               │        lbz    r9,0(r31)                        ▒
               │        rldicr r9,r9,3,60                       ▒
               │        ldx    r9,r30,r9                        ▒
               │        mtctr  r9                               ▒
        100.00 │      ↓ bctr                                    ▒
               │        lwa    r10,4(r31)                       ▒
               │        lwz    r9,0(r31)                        ▒
        <SNIP>
        Invalid jump offset: 3fffffffffe01510
      
      This, in turn, uncovers another problem with jumps without operands, the
      ENTER/-> operation, to jump to the target, still continues using the bogus
      target :-)
      
      BTW, this was the file used for the above tests:
      
        [acme@jouet ravi_bangoria]$ perf report --header-only -i perf.data.f22vm.powerdev
        # ========
        # captured on: Thu Nov 24 12:40:38 2016
        # hostname : pdev-f22-qemu
        # os release : 4.4.10-200.fc22.ppc64
        # perf version : 4.9.rc1.g6298ce
        # arch : ppc64
        # nrcpus online : 48
        # nrcpus avail : 48
        # cpudesc : POWER7 (architected), altivec supported
        # cpuid : 74,513
        # total memory : 4158976 kB
        # cmdline : /home/ravi/Workspace/linux/tools/perf/perf record -a
        # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, disabled = 1, inherit = 1, mmap = 1, c
        # HEADER_CPU_TOPOLOGY info available, use -I to display
        # HEADER_NUMA_TOPOLOGY info available, use -I to display
        # pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5
        # missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK HEADER_GROUP_DESC HEADER_AUXTRACE HEADER_STAT HEADER_CACHE
        # ========
        #
        [acme@jouet ravi_bangoria]$
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bec60e50
  20. 02 12月, 2016 2 次提交
    • K
      perf annotate: AArch64 support · 0fcb1da4
      Kim Phillips 提交于
      This is a regex converted version from the original:
      
      	https://lkml.org/lkml/2016/5/19/461
      
      Add basic support to recognise AArch64 assembly. This allows perf to
      identify AArch64 instructions that branch to other parts within the
      same function, thereby properly annotating them.
      
      Rebased onto new cross-arch annotation bits:
      
      	https://lkml.org/lkml/2016/11/25/546
      
      Sample output:
      
      security_file_permission  vmlinux
        5.80 │    ← ret                                                  ▒
             │70:   ldr    w0, [x21,#68]                                 ▒
        4.44 │    ↓ tbnz   d0                                            ▒
             │      mov    w0, #0x24                       // #36        ▒
        1.37 │      ands   w0, w22, w0                                   ▒
             │    ↑ b.eq   60                                            ▒
        1.37 │    ↓ tbnz   e4                                            ▒
             │      mov    w19, #0x20000                   // #131072    ▒
        1.02 │    ↓ tbz    ec                                            ▒
             │90:┌─→ldr    x3, [x21,#24]                                 ▒
        1.37 │   │  add    x21, x21, #0x10                               ▒
             │   │  mov    w2, w19                                       ▒
        1.02 │   │  mov    x0, x21                                       ▒
             │   │  mov    x1, x3                                        ▒
        1.71 │   │  ldr    x20, [x3,#48]                                 ▒
             │   │→ bl     __fsnotify_parent                             ▒
        0.68 │   │↑ cbnz   60                                            ▒
             │   │  mov    x2, x21                                       ▒
        1.37 │   │  mov    w1, w19                                       ▒
             │   │  mov    x0, x20                                       ▒
        0.68 │   │  mov    w5, #0x0                        // #0         ▒
             │   │  mov    x4, #0x0                        // #0         ▒
        1.71 │   │  mov    w3, #0x1                        // #1         ▒
             │   │→ bl     fsnotify                                      ▒
        1.37 │   │↑ b      60                                            ▒
             │d0:│  mov    w0, #0x0                        // #0         ▒
             │   │  ldp    x19, x20, [sp,#16]                            ▒
             │   │  ldp    x21, x22, [sp,#32]                            ▒
             │   │  ldp    x29, x30, [sp],#48                            ▒
             │   │← ret                                                  ▒
             │e4:│  mov    w19, #0x10000                   // #65536     ▒
             │   └──b      90                                            ◆
             │ec:   brk    #0x800                                        ▒
      Press 'h' for help on key bindings
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Signed-off-by: NChris Ryder <chris.ryder@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20161130092344.012e18e3e623bea395162f95@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0fcb1da4
    • K
      perf annotate: Use arch->objdump.comment_char in dec__parse() · 859afa6c
      Kim Phillips 提交于
      Presume neglected in commit 786c1b51 "perf annotate: Start supporting
      cross arch annotation".  This doesn't fix a bug since none of the
      affected arches support parsing dec/inc instructions yet.
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Ryder <chris.ryder@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20161130092333.1cca5dd2c77e1790d61c1e9c@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      859afa6c
  21. 25 11月, 2016 3 次提交
    • R
      perf annotate: Initial PowerPC support · dbdebdc5
      Ravi Bangoria 提交于
      Support the PowerPC architecture using the ins_ops association
      method.
      
      Committer notes:
      
      Testing it with a perf.data file collected on a PowerPC machine and
      cross-annotated on a x86_64 workstation, using the associated vmlinux
      file:
      
      $ perf report -i perf.data.f22vm.powerdev --vmlinux vmlinux.powerpc
        .ktime_get  vmlinux.powerpc
              │      clrldi r9,r28,63
         8.57 │   ┌──bne    e0                   <- TUI cursor positioned here
              │54:│  lwsync
         2.86 │   │  std    r2,40(r1)
              │   │  ld     r9,144(r31)
              │   │  ld     r3,136(r31)
              │   │  ld     r30,184(r31)
              │   │  ld     r10,0(r9)
              │   │  mtctr  r10
              │   │  ld     r2,8(r9)
         8.57 │   │→ bctrl
              │   │  ld     r2,40(r1)
              │   │  ld     r10,160(r31)
              │   │  ld     r5,152(r31)
              │   │  lwz    r7,168(r31)
              │   │  ld     r9,176(r31)
         8.57 │   │  lwz    r6,172(r31)
              │   │  lwsync
         2.86 │   │  lwz    r8,128(r31)
              │   │  cmpw   cr7,r8,r28
         2.86 │   │↑ bne    48
              │   │  subf   r10,r10,r3
              │   │  mr     r3,r29
              │   │  and    r10,r10,r5
         2.86 │   │  mulld  r10,r10,r7
              │   │  add    r9,r10,r9
              │   │  srd    r9,r9,r6
              │   │  add    r9,r9,r30
              │   │  std    r9,0(r29)
              │   │  addi   r1,r1,144
              │   │  ld     r0,16(r1)
              │   │  ld     r28,-32(r1)
              │   │  ld     r29,-24(r1)
              │   │  ld     r30,-16(r1)
              │   │  mtlr   r0
              │   │  ld     r31,-8(r1)
              │   │← blr
         5.71 │e0:└─→mr     r1,r1
        11.43 │      mr     r2,r2
        11.43 │      lwz    r28,128(r31)
        Press 'h' for help on key bindings
      
        $ perf report -i perf.data.f22vm.powerdev --header-only
        # ========
        # captured on: Thu Nov 24 12:40:38 2016
        # hostname : pdev-f22-qemu
        # os release : 4.4.10-200.fc22.ppc64
        # perf version : 4.9.rc1.g6298ce
        # arch : ppc64
        # nrcpus online : 48
        # nrcpus avail : 48
        # cpudesc : POWER7 (architected), altivec supported
        # cpuid : 74,513
        # total memory : 4158976 kB
        # cmdline : /home/ravi/Workspace/linux/tools/perf/perf record -a
        # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
        # HEADER_CPU_TOPOLOGY info available, use -I to display
        # HEADER_NUMA_TOPOLOGY info available, use -I to display
        # pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5
        # missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK HEADER_GROUP_DESC HEADER_AUXTRACE HEADER_STAT HEADER_CACHE
        # ========
        #
        $
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/n/tip-tbjnp40ddoxxl474uvhwi6g4@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dbdebdc5
    • A
      perf annotate: Improve support for ARM · acc9bfb5
      Arnaldo Carvalho de Melo 提交于
      By using arch->init() to set up some regular expressions to associate
      ins_ops to ARM instructions, ditching that old table that has
      instructions not present on ARM.
      
      Take advantage of having an arch->init() to hide more arm specific stuff
      from the common code, like the objdump details.
      
      The regular expressions comes from a patch written by Kim Phillips.
      Reviewed-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-77m7lufz9ajjimkrebtg5ead@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      acc9bfb5
    • A
      perf annotate: Allow arches to have a init routine and a priv area · 0781ea92
      Arnaldo Carvalho de Melo 提交于
      Arches like ARM will want to use regular expressions when deciding what
      instructions to associate with what ins_ops, provide infrastructure for
      that.
      Reviewed-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-7dmnk9el2ipu3nxog092k9z5@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0781ea92