1. 18 8月, 2017 2 次提交
  2. 29 7月, 2017 1 次提交
  3. 28 7月, 2017 4 次提交
  4. 21 7月, 2017 3 次提交
  5. 19 7月, 2017 3 次提交
    • J
      perf report: Enable finding kernel inline functions · 8b8ef2d7
      Jin Yao 提交于
      Currently perf supports a mode to query inline stack. It works well for
      finding user space inline functions but it doesn't work for kernel ones,
      due to some unnecessary check.
      
      This patch removes these unnecessary checks. Now kernel inline functions
      can be reported.
      
      For example:
      
        perf report --inline -g func --stdio
      
        |--46.19%--do_huge_pmd_anonymous_page
        |          do_huge_pmd_anonymous_page (inline)
        |          __do_huge_pmd_anonymous_page (inline)
        |          __SetPageUptodate (inline)
        |          __set_bit (inline)
      
        The result is compared with the output of addr2line. They match.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500409892-15904-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b8ef2d7
    • J
      perf annotate: Implement visual marker for macro fusion · 7e63a13a
      Jin Yao 提交于
      For marking fused instructions clearly this patch adds a line before the
      first instruction of pair and joins it with the arrow of the jump to its
      target.
      
      For example, when "je" is selected in annotate view, the line before
      cmpl is displayed and joins the arrow of "je".
      
             │   ┌──cmpl   $0x0,argp_program_version_hook
       81.93 │   ├──je     20
             │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
             │   │↓ jne    29
             │   │↓ jmp    43
       11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
      
      That means the cmpl+je is a fused instruction pair and they should be
      considered together.
      
      Changelog:
      
      v3: Use Arnaldo's fix to improve the arrow origin rendering.  To get the
          evsel->evlist->env->cpuid, save the evsel in annotate_browser.
      
      v2: new function "ins__is_fused" to check if the instructions are fused.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7e63a13a
    • J
      perf annotate: Check for fused instructions · 69fb09f6
      Jin Yao 提交于
      Macro fusion merges two instructions to a single micro-op. Intel core
      platform performs this hardware optimization under limited
      circumstances.
      
      For example, CMP + JCC can be "fused" and executed /retired together.
      While with sampling this can result in the sample sometimes being on the
      JCC and sometimes on the CMP.  So for the fused instruction pair, they
      could be considered together.
      
      On Nehalem, fused instruction pairs:
      
        cmp/test + jcc.
      
      On other new CPU:
      
        cmp/test/add/sub/and/inc/dec + jcc.
      
      This patch adds an x86-specific function which checks if 2 instructions
      are in a "fused" pair. For non-x86 arch, the function is just NULL.
      
      Changelog:
      
      v4: Move the CPU model checking to symbol__disassemble and save the CPU
          family/model in arch structure.
      
          It avoids checking every time when jump arrow printed.
      
      v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs
          just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC).
      
      v2: Remove the original weak function. Arnaldo points out that doing it
          as a weak function that will be overridden by the host arch doesn't
          work. So now it's implemented as an arch-specific function.
      
      Committer fix:
      
      Do not access evsel->evlist->env->cpuid, ->env can be null, introduce
      perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in
      this function call.
      
      The original patch was segfaulting 'perf top' + annotation.
      
      But this essentially disables this fused instructions augmentation in
      'perf top', the right thing is to get the cpuid from the running kernel,
      left for a later patch tho.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69fb09f6
  6. 11 7月, 2017 1 次提交
  7. 20 6月, 2017 3 次提交
  8. 24 5月, 2017 1 次提交
    • N
      perf tools: Put caller above callee in --children mode · 7111ffff
      Namhyung Kim 提交于
      The __hpp__sort_acc() sorts entries using callchain depth in order to
      put callers above in children mode.  But it assumed the callchain order
      was callee-first.  Now default (for children) is caller-first so the
      order of entries is reverted.
      
      For example, consider following case:
      
        $ perf report --no-children
        ..l
        # Overhead  Command  Shared Object        Symbol
        # ........  .......  ...................  ..........................
        #
            99.44%  a.out    a.out                [.] main
                    |
                    ---main
                       __libc_start_main
                       _start
      
      Then children mode should show 'start' above '__libc_start_main' since
      it's the caller (parent) of the __libc_start_main.  But it's reversed:
      
        # Children      Self  Command  Shared Object    Symbol
        # ........  ........  .......  ...............  .....................
        #
            99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
            99.61%     0.00%  a.out    a.out            [.] _start
            99.54%    99.44%  a.out    a.out            [.] main
      
      This patch fixes it.
      
        # Children      Self  Command  Shared Object    Symbol
        # ........  ........  .......  ...............  .....................
        #
            99.61%     0.00%  a.out    a.out            [.] _start
            99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
            99.54%    99.44%  a.out    a.out            [.] main
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-8-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7111ffff
  9. 27 4月, 2017 1 次提交
  10. 25 4月, 2017 1 次提交
  11. 21 4月, 2017 2 次提交
  12. 20 4月, 2017 9 次提交
  13. 11 4月, 2017 1 次提交
  14. 27 3月, 2017 3 次提交
    • M
      perf report: Enable sorting by srcline as key · 5dfa210e
      Milian Wolff 提交于
      Often it is interesting to know how costly a given source line is in
      total. Previously, one had to build these sums manually based on all
      addresses that pointed to the same source line. This patch introduces
      srcline as a sort key, which will do the aggregation for us.
      
      Paired with the recent addition of showing inline frames, this makes
      perf report much more useful for many C++ work loads.
      
      The following shows the new feature in action. First, let's show the
      status quo output when we sort by address. The result contains many hist
      entries that generate the same output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g address
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--60.31%--hypot +20
                  |          |          |
                  |          |          |--8.52%--__hypot_finite +273
                  |          |          |
                  |          |          |--7.32%--__hypot_finite +411
      ...
                   --35.34%--_start +4194346
                             __libc_start_main +241
                             |
                             |--6.65%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--2.70%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--1.69%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      
      With this patch and `-g srcline` we instead get the following output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g srcline
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--64.02%--hypot
                  |          |          |
                  |          |           --59.81%--__hypot_finite
                  |          |
                  |           --0.53%--cabs
                  |
                   --35.34%--_start
                             __libc_start_main
                             |
                             |--12.48%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5dfa210e
    • J
      perf report: Show inline stack for browser mode · 0d3eb0b7
      Jin Yao 提交于
      If the address belongs to an inlined function, the source information
      back to the first non-inlined function will be printed.
      
      For example:
      
      1. Show inlined function name
         perf report -g function --inline
      
      -    0.69%     0.00%  inline   ld-2.23.so           [.] dl_main
         - dl_main
              0.56% _dl_relocate_object
               _dl_relocate_object (inline)
               elf_dynamic_do_Rela (inline)
      
      2. Show the file/line information
         perf report -g address --inline
      
      -    0.69%     0.00%  inline   ld-2.23.so           [.] _dl_start
           _dl_start rtld.c:307
            /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
         + _dl_sysdep_start dl-sysdep.c:250
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d3eb0b7
    • J
      perf report: Show inline stack for stdio mode · 0db64dd0
      Jin Yao 提交于
      If the address belongs to an inlined function, the source information
      back to the first non-inlined function will be printed.
      
      For example:
      
      1. Show inlined function name
         perf report --stdio -g function --inline
      
           0.69%     0.00%  inline   ld-2.23.so           [.] dl_main
                  |
                  ---dl_main
                     |
                      --0.56%--_dl_relocate_object
                                _dl_relocate_object (inline)
                                elf_dynamic_do_Rela (inline)
      
      2. Show the file/line information
         perf report --stdio -g address --inline
      
           0.69%     0.00%  inline   ld-2.23.so           [.] _dl_start_user
                  |
                  ---_dl_start_user .:0
                     _dl_start rtld.c:307
                     /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
                     _dl_sysdep_start dl-sysdep.c:250
                     |
                      --0.56%--dl_main rtld.c:2076
      
      Committer tests:
      
        # perf record --call-graph dwarf ~/bin/perf stat usleep 1
      
       Performance counter stats for 'usleep 1':
      
                0.443020      task-clock (msec)         #    0.449 CPUs utilized
                       1      context-switches          #    0.002 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      52      page-faults               #    0.117 M/sec
               1,049,423      cycles                    #    2.369 GHz
                 801,456      instructions              #    0.76  insn per cycle
                 155,609      branches                  #  351.246 M/sec
                   7,026      branch-misses             #    4.52% of all branches
      
             0.000987570 seconds time elapsed
      
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.553 MB perf.data (66 samples) ]
        # perf report --stdio --inline fs__get_mountpoint
        <SNIP>
           1.73%     0.00%  perf     perf           [.] fs__get_mountpoint
                  |
                  ---fs__get_mountpoint
                     fs__get_mountpoint (inline)
                     fs__check_mounts (inline)
                     __statfs
                     entry_SYSCALL_64
                     sys_statfs
                     SYSC_statfs
                     user_statfs
                     user_path_at_empty
                     filename_lookup
                     path_lookupat
                     link_path_walk
                     inode_permission
                     __inode_permission
                     kernfs_iop_permission
                     kernfs_refresh_inode
                     security_inode_notifysecctx
                     selinux_inode_notifysecctx
                     selinux_inode_setsecurity
                     security_context_to_sid
                     security_context_to_sid_core
                     string_to_context_struct
                     symcmp
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1490474069-15823-5-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0db64dd0
  15. 13 3月, 2017 1 次提交
  16. 20 2月, 2017 1 次提交
  17. 02 2月, 2017 2 次提交
  18. 21 1月, 2017 1 次提交