1. 29 7月, 2017 1 次提交
  2. 28 7月, 2017 4 次提交
  3. 21 7月, 2017 2 次提交
  4. 19 7月, 2017 3 次提交
    • J
      perf report: Enable finding kernel inline functions · 8b8ef2d7
      Jin Yao 提交于
      Currently perf supports a mode to query inline stack. It works well for
      finding user space inline functions but it doesn't work for kernel ones,
      due to some unnecessary check.
      
      This patch removes these unnecessary checks. Now kernel inline functions
      can be reported.
      
      For example:
      
        perf report --inline -g func --stdio
      
        |--46.19%--do_huge_pmd_anonymous_page
        |          do_huge_pmd_anonymous_page (inline)
        |          __do_huge_pmd_anonymous_page (inline)
        |          __SetPageUptodate (inline)
        |          __set_bit (inline)
      
        The result is compared with the output of addr2line. They match.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500409892-15904-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b8ef2d7
    • J
      perf annotate: Implement visual marker for macro fusion · 7e63a13a
      Jin Yao 提交于
      For marking fused instructions clearly this patch adds a line before the
      first instruction of pair and joins it with the arrow of the jump to its
      target.
      
      For example, when "je" is selected in annotate view, the line before
      cmpl is displayed and joins the arrow of "je".
      
             │   ┌──cmpl   $0x0,argp_program_version_hook
       81.93 │   ├──je     20
             │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
             │   │↓ jne    29
             │   │↓ jmp    43
       11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
      
      That means the cmpl+je is a fused instruction pair and they should be
      considered together.
      
      Changelog:
      
      v3: Use Arnaldo's fix to improve the arrow origin rendering.  To get the
          evsel->evlist->env->cpuid, save the evsel in annotate_browser.
      
      v2: new function "ins__is_fused" to check if the instructions are fused.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7e63a13a
    • J
      perf annotate: Check for fused instructions · 69fb09f6
      Jin Yao 提交于
      Macro fusion merges two instructions to a single micro-op. Intel core
      platform performs this hardware optimization under limited
      circumstances.
      
      For example, CMP + JCC can be "fused" and executed /retired together.
      While with sampling this can result in the sample sometimes being on the
      JCC and sometimes on the CMP.  So for the fused instruction pair, they
      could be considered together.
      
      On Nehalem, fused instruction pairs:
      
        cmp/test + jcc.
      
      On other new CPU:
      
        cmp/test/add/sub/and/inc/dec + jcc.
      
      This patch adds an x86-specific function which checks if 2 instructions
      are in a "fused" pair. For non-x86 arch, the function is just NULL.
      
      Changelog:
      
      v4: Move the CPU model checking to symbol__disassemble and save the CPU
          family/model in arch structure.
      
          It avoids checking every time when jump arrow printed.
      
      v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs
          just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC).
      
      v2: Remove the original weak function. Arnaldo points out that doing it
          as a weak function that will be overridden by the host arch doesn't
          work. So now it's implemented as an arch-specific function.
      
      Committer fix:
      
      Do not access evsel->evlist->env->cpuid, ->env can be null, introduce
      perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in
      this function call.
      
      The original patch was segfaulting 'perf top' + annotation.
      
      But this essentially disables this fused instructions augmentation in
      'perf top', the right thing is to get the cpuid from the running kernel,
      left for a later patch tho.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69fb09f6
  5. 20 6月, 2017 3 次提交
  6. 25 4月, 2017 1 次提交
  7. 21 4月, 2017 1 次提交
  8. 20 4月, 2017 8 次提交
  9. 27 3月, 2017 2 次提交
    • M
      perf report: Enable sorting by srcline as key · 5dfa210e
      Milian Wolff 提交于
      Often it is interesting to know how costly a given source line is in
      total. Previously, one had to build these sums manually based on all
      addresses that pointed to the same source line. This patch introduces
      srcline as a sort key, which will do the aggregation for us.
      
      Paired with the recent addition of showing inline frames, this makes
      perf report much more useful for many C++ work loads.
      
      The following shows the new feature in action. First, let's show the
      status quo output when we sort by address. The result contains many hist
      entries that generate the same output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g address
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--60.31%--hypot +20
                  |          |          |
                  |          |          |--8.52%--__hypot_finite +273
                  |          |          |
                  |          |          |--7.32%--__hypot_finite +411
      ...
                   --35.34%--_start +4194346
                             __libc_start_main +241
                             |
                             |--6.65%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--2.70%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--1.69%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      
      With this patch and `-g srcline` we instead get the following output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g srcline
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--64.02%--hypot
                  |          |          |
                  |          |           --59.81%--__hypot_finite
                  |          |
                  |           --0.53%--cabs
                  |
                   --35.34%--_start
                             __libc_start_main
                             |
                             |--12.48%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5dfa210e
    • J
      perf report: Show inline stack for browser mode · 0d3eb0b7
      Jin Yao 提交于
      If the address belongs to an inlined function, the source information
      back to the first non-inlined function will be printed.
      
      For example:
      
      1. Show inlined function name
         perf report -g function --inline
      
      -    0.69%     0.00%  inline   ld-2.23.so           [.] dl_main
         - dl_main
              0.56% _dl_relocate_object
               _dl_relocate_object (inline)
               elf_dynamic_do_Rela (inline)
      
      2. Show the file/line information
         perf report -g address --inline
      
      -    0.69%     0.00%  inline   ld-2.23.so           [.] _dl_start
           _dl_start rtld.c:307
            /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
         + _dl_sysdep_start dl-sysdep.c:250
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d3eb0b7
  10. 13 3月, 2017 1 次提交
  11. 20 2月, 2017 1 次提交
  12. 21 1月, 2017 1 次提交
  13. 20 1月, 2017 1 次提交
  14. 16 12月, 2016 1 次提交
    • R
      perf annotate: Fix jump target outside of function address range · e216874c
      Ravi Bangoria 提交于
      If jump target is outside of function range, perf is not handling it
      correctly. Especially when target address is lesser than function start
      address, target offset will be negative. But, target address declared to
      be unsigned, converts negative number into 2's complement. See below
      example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
      lesser than function start address(34cf0).
      
              34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0
      
      Objdump output:
      
        0000000000034cf0 <__sigaction>:
        __GI___sigaction():
          34cf0: lea    -0x20(%rdi),%eax
          34cf3: cmp    -bashx1,%eax
          34cf6: jbe    34d00 <__sigaction+0x10>
          34cf8: jmpq   34ac0 <__GI___libc_sigaction>
          34cfd: nopl   (%rax)
          34d00: mov    0x386161(%rip),%rax        # 3bae68 <_DYNAMIC+0x2e8>
          34d07: movl   -bashx16,%fs:(%rax)
          34d0e: mov    -bashxffffffff,%eax
          34d13: retq
      
      perf annotate before applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              v  jmpq   fffffffffffffdd0
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      
      perf annotate after applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              ^  jmpq   34ac0 <__GI___libc_sigaction>
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e216874c
  15. 26 11月, 2016 1 次提交
  16. 25 11月, 2016 1 次提交
    • A
      perf annotate: Remove duplicate 'name' field from disasm_line · 75b49202
      Arnaldo Carvalho de Melo 提交于
      The disasm_line::name field is always equal to ins::name, being used
      just to locate the instruction's ins_ops from the per-arch instructions
      table.
      
      Eliminate this duplication, nuking that field and instead make
      ins__find() return an ins_ops, store it in disasm_line::ins.ops, and
      keep just in disasm_line::ins.name what was in disasm_line::name, this
      way we end up not keeping a reference to entries in the per-arch
      instructions table.
      
      This in turn will help supporting multiple ways to manage the per-arch
      instructions table, allowing resorting that array, for instance, when
      the entries will move after references to its addresses were made. The
      same problem is avoided when one grows the array with realloc.
      
      So architectures simply keeping a constant array will work as well as
      architectures building the table using regular expressions or other
      logic that involves resorting the table.
      Reviewed-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-vr899azvabnw9gtuepuqfd9t@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      75b49202
  17. 18 11月, 2016 1 次提交
    • A
      perf annotate: Start supporting cross arch annotation · 786c1b51
      Arnaldo Carvalho de Melo 提交于
      Introduce a 'struct arch', where arch specific stuff will live, starting
      with objdump's choice of comment delimitation character, that is '#' in
      x86 while a ';' in arm.
      
      This has some bits and pieces from a patch submitted by Ravi.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-f337tzjjcl8vtapgvjxmhrbx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      786c1b51
  18. 15 11月, 2016 1 次提交
  19. 09 11月, 2016 4 次提交
  20. 25 10月, 2016 1 次提交
  21. 24 10月, 2016 1 次提交