1. 31 10月, 2017 2 次提交
  2. 27 10月, 2017 7 次提交
  3. 25 10月, 2017 5 次提交
    • M
      perf util: Enable handling of inlined frames by default · d8a88dd2
      Milian Wolff 提交于
      Now that we have caches in place to speed up the process of finding
      inlined frames and srcline information repeatedly, we can enable this
      useful option by default.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8a88dd2
    • M
      perf report: Use srcline from callchain for hist entries · 1fb7d06a
      Milian Wolff 提交于
      This also removes the symbol name from the srcline column, more on this
      below.
      
      This ensures we use the correct srcline, which could originate from a
      potentially inlined function. The hist entries used to query for the
      srcline based purely on the IP, which leads to wrong results for inlined
      entries.
      
      Before:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ..................................................................................................................................
        #
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            44.58%     0.00%  main+100
            44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  std::__complex_abs+100
            44.58%     0.00%  std::abs<double>+100
            44.58%     0.00%  std::norm<double>+100
            36.01%     0.00%  hypot+18446603487892193300
            25.81%     0.00%  main+41
            25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.75%    25.75%  random.h:143
            18.39%     0.00%  main+57
            18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  random.tcc:3330
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ...........................................
        #
            94.30%     1.19%  main.cpp:39
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            48.44%     1.70%  random.h:1823
            48.44%     0.00%  random.h:1814
            46.74%     2.53%  random.h:185
            44.68%     0.10%  complex:589
            44.68%     0.00%  complex:597
            44.68%     0.00%  complex:654
            44.68%     0.00%  complex:664
            40.61%    13.80%  random.tcc:3330
            36.01%     0.00%  hypot+18446603487892193300
            26.81%     0.00%  random.h:151
            26.81%     0.00%  random.h:332
            25.75%    25.75%  random.h:143
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Note that this change removes the symbol from the source:line hist
      column. If this information is desired, users should explicitly query
      for it if needed. I.e. run this command instead:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ...........................................
        #
            94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
            46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
            44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
            44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
            44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
            44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
            39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
            26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
            25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
            25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Compared to the old behavior, this reduces duplication in the output.
      Before we used to print the symbol name in the srcline column even
      when the sym column was explicitly requested. I.e. the output was:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ..................................................................................................................................
        #
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            44.58%     0.00%  [.] main                                                                                                                             main+100
            44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
            44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
            44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            25.81%     0.00%  [.] main                                                                                                                             main+41
            25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
            18.39%     0.00%  [.] main                                                                                                                             main+57
            18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fb7d06a
    • M
      perf report: Cache srclines for callchain nodes · 21ac9d54
      Milian Wolff 提交于
      On one hand this ensures that the memory is properly freed when the DSO
      gets freed. On the other hand this significantly speeds up the
      processing of the callchain nodes when lots of srclines are requested.
      For one of my data files e.g.:
      
      Before:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            52496.495043      task-clock (msec)         #    0.999 CPUs utilized
                     634      context-switches          #    0.012 K/sec
                       2      cpu-migrations            #    0.000 K/sec
                 191,561      page-faults               #    0.004 M/sec
         165,074,498,235      cycles                    #    3.144 GHz
         334,170,832,408      instructions              #    2.02  insn per cycle
          90,220,029,745      branches                  # 1718.591 M/sec
             654,525,177      branch-misses             #    0.73% of all branches
      
            52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
      
      After:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                      31      context-switches          #    0.001 K/sec
                       0      cpu-migrations            #    0.000 K/sec
                 185,471      page-faults               #    0.008 M/sec
          71,188,113,681      cycles                    #    3.149 GHz
         133,204,943,083      instructions              #    1.87  insn per cycle
          34,886,384,979      branches                  # 1543.214 M/sec
             278,214,495      branch-misses             #    0.80% of all branches
      
            22.609857253 seconds time elapsed
      
      Note that the difference is only this large when `--inline` is not
      passed. In such situations, we would use the inliner cache and thus do
      not run this code path that often.
      
      I think that this cache should actually be used in other places, too.
      When looking at the valgrind leak report for perf report, we see tons of
      srclines being leaked, most notably from calls to
      hist_entry__get_srcline. The problem is that get_srcline has many
      different formatting options (show_sym, show_addr, potentially even
      unwind_inlines when calling __get_srcline directly). As such, the
      srcline cannot easily be cached for all calls, or we'd have to add
      caches for all formatting combinations (6 so far). An alternative would
      be to remove the formatting options and handle that on a different level
      - i.e. print the sym/addr on demand wherever we actually output
      something. And the unwind_inlines could be moved into a separate
      function that does not return the srcline.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      21ac9d54
    • M
      perf report: Cache failed lookups of inlined frames · b38775cf
      Milian Wolff 提交于
      When no inlined frames could be found for a given address, we did not
      store this information anywhere. That means we potentially do the costly
      inliner lookup repeatedly for cases where we know it can never succeed.
      
      This patch makes dso__parse_addr_inlines always return a valid
      inline_node. It will be empty when no inliners are found. This enables
      us to cache the empty list in the DSO, thereby improving the performance
      when many addresses fail to find the inliners.
      
      For my trivial example, the performance impact is already quite
      significant:
      
      Before:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                      53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                       0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
                   5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
           2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
           4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
             939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
              11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )
      
             0.596246531 seconds time elapsed                                          ( +-  0.07% )
      ~~~~~
      
      After:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                      29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                       0      cpu-migrations            #    0.000 K/sec
                   5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
             432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
             670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
             141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
               2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )
      
             0.114222393 seconds time elapsed                                          ( +-  1.19% )
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b38775cf
    • M
      perf report: Properly handle branch count in match_chain() · bf36eb5c
      Milian Wolff 提交于
      Some of the code paths I introduced before returned too early without
      running the code to handle a node's branch count.  By refactoring
      match_chain to only have one exit point, this can be remedied.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
      Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bf36eb5c
  4. 24 10月, 2017 18 次提交
  5. 23 10月, 2017 8 次提交