1. 24 10月, 2017 4 次提交
    • M
      perf callchain: Compare symbol name for inlined frames when matching · 9856240a
      Milian Wolff 提交于
      The fake symbols we create for inlined frames will represent different
      functions but can use the symbol start address. This leads to issues
      when different inline branches all lead to the same function.
      
      Before:
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                              --37.57%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        |
                                         --36.36%--std::abs<double> (inlined)
                                                   std::__complex_abs (inlined)
                                                   |
                                                    --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                                                              std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                                                              std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
      ~~~~~
      
      Note that this backtrace representation is completely bogus.
      Complex abs does not call the linear congruential engine! It
      is just a side-effect of a longer inlined stack being appended
      to a shorter, different inlined stack, both of which originate
      in the same function (main).
      
      This patch fixes the issue:
      
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                             |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          |
                             |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
                             |                     std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |                     |
                             |                      --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                             |                                std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                             |                                std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
                             |
                              --1.99%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        std::abs<double> (inlined)
                                        std::__complex_abs (inlined)
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      [ Fix up conflict with c1fbc0cf ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9856240a
    • M
      perf callchain: Mark inlined frames in output by " (inlined)" suffix · 8932f807
      Milian Wolff 提交于
      The original patch that introduced inline frame output in the various
      browsers used this suffix already. The new centralized approach that
      uses fake symbols for inlined frames was missing this approach so far.
      
      Instead of changing the symbol name itself, we only print the suffix
      where needed. This allows us to efficiently lookup the symbol for a
      given name without first having to append the suffix before the lookup.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8932f807
    • M
      perf report: Fall-back to function name comparison for -g srcline · cbe50f61
      Milian Wolff 提交于
      When a callchain entry has no srcline available, we ended up comparing
      the instruction pointer. I consider this to be not too useful. Rather, I
      think we should group the entries by function name, which this patch
      adds. For people who want to split the data on the IP boundary, using
      `-g address` is the correct choice.
      
      Before:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--56.03%--hypot
                  |          |          |
                  |          |          |--8.45%--__hypot_finite
                  |          |          |
                  |          |          |--7.62%--__hypot_finite
                  |          |          |
                  |          |          |--2.29%--__hypot_finite
                  |          |          |
                  |          |          |--2.24%--__hypot_finite
                  |          |          |
                  |          |          |--2.06%--__hypot_finite
                  |          |          |
                  |          |          |--1.81%--__hypot_finite
      ...
      ~~~~~
      
      After:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--60.29%--hypot
                  |          |          |
                  |          |           --56.03%--__hypot_finite
                  |          |
                  |           --0.85%--cabs
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cbe50f61
    • M
      perf callchain: Store srcline in callchain_cursor_node · 40a342cd
      Milian Wolff 提交于
      This is mostly a preparation to enable the creation of full callchain
      nodes for inline frames. Such frames will reference the IP of the
      non-inlined frame, but hold the symbol and srcline for an inlined
      location. As such, we won't be able to query the srcline on-demand based
      on the IP alone. Instead, we will leverage the functionality provided by
      this patch here, and store the srcline for the inlined nodes in the new
      srcline member of callchain_cursor_node.
      
      Note that this patch on its own leaks the srcline, as there is no
      free_callchain_cursor_node or similar. A future patch will add caching
      of the srcline and handle deletion properly.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      40a342cd
  2. 05 10月, 2017 1 次提交
  3. 25 9月, 2017 1 次提交
    • M
      perf report: Fix debug messages with --call-graph option · 9789e7e9
      Mengting Zhang 提交于
      With --call-graph option, perf report can display call chains using
      type, min percent threshold, optional print limit and order. And the
      default call-graph parameter is 'graph,0.5,caller,function,percent'.
      
      Before this patch, 'perf report --call-graph' shows incorrect debug
      messages as below:
      
        # perf report --call-graph
        Invalid callchain mode: 0.5
        Invalid callchain order: 0.5
        Invalid callchain sort key: 0.5
        Invalid callchain config key: 0.5
        Invalid callchain mode: caller
        Invalid callchain mode: function
        Invalid callchain order: function
        Invalid callchain mode: percent
        Invalid callchain order: percent
        Invalid callchain sort key: percent
      
      That is because in function __parse_callchain_report_opt(),each field of
      the call-graph parameter is passed to parse_callchain_{mode,order,
      sort_key,value} in turn until it meets the matching value.
      
      For example, the order field "caller" is passed to
      parse_callchain_mode() firstly and obviously it doesn't match any mode
      field. Therefore parse_callchain_mode() will shows the debug message
      "Invalid callchain mode: caller", which could confuse users.
      
      The patch fixes this issue by moving the warning out of the function
      parse_callchain_{mode,order,sort_key,value}.
      Signed-off-by: NMengting Zhang <zhangmengting@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/1506154694-39691-1-git-send-email-zhangmengting@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9789e7e9
  4. 30 8月, 2017 1 次提交
    • J
      perf report: Calculate the average cycles of iterations · c4ee0625
      Jin Yao 提交于
      The branch history code has a loop detection function. With this, we can
      get the number of iterations by calculating the removed loops.
      
      While it would be nice for knowing the average cycles of iterations.
      This patch adds up the cycles in branch entries of removed loops and
      save the result to the next branch entry (e.g. branch entry A).
      
      Finally it will display the iteration number and average cycles at the
      "from" of branch entry A.
      
      For example:
      perf record -g -j any,save_type ./div
      perf report --branch-history --no-children --stdio
      
      --22.63%--main div.c:42 (RET CROSS_2M)
                compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2)
                |
                 --10.73%--compute_flag div.c:27 (RET CROSS_2M)
                           rand rand.c:28 (cycles:1)
                           rand rand.c:28 (RET CROSS_2M)
                           __random random.c:298 (cycles:1)
                           __random random.c:297 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (RET CROSS_2M)
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c4ee0625
  5. 26 7月, 2017 2 次提交
    • J
      perf report: Tag branch type/flag on "to" and tag cycles on "from" · a1a8bed3
      Jin Yao 提交于
      Current --branch-history LBR annotation displays confused data. For
      example, each cycles report is duplicated on both "from" and "to"
      entries.
      
      For example:
      
        perf report --branch-history --no-children --stdio
      
        --2.32%--main div.c:39 (COND_BWD CROSS_2M predicted:49.7% cycles:1)
                  main div.c:44 (predicted:49.7% cycles:1)
                  main div.c:42 (RET CROSS_2M cycles:2)
                  compute_flag div.c:28 (cycles:2)
                  compute_flag div.c:27 (RET CROSS_2M cycles:1)
                  rand rand.c:28 (cycles:1)
                  rand rand.c:28 (RET CROSS_2M cycles:1)
                  __random random.c:298 (cycles:1)
                  __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                  __random random.c:295 (cycles:1)
                  __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                  __random random.c:295 (cycles:1)
                  __random random.c:295 (RET CROSS_2M cycles:9)
      
      The cycles should be tagged only on the "from". It's for the code block
      that ends with "from", not for "to".
      
      Another issue is the "predicted:49.7%" is duplicated too (tag on both
      "from" and "to").
      
      This patch tags the branch type/flag on "to" and tag the cycles on
      "from".
      
      For example:
      
        --2.32%--main div.c:39 (COND_BWD CROSS_2M predicted:49.7%)
                  main div.c:44 (cycles:1)
                  main div.c:42 (RET CROSS_2M)
                  compute_flag div.c:28 (cycles:2)
                  compute_flag div.c:27 (RET CROSS_2M)
                  rand rand.c:28 (cycles:1)
                  rand rand.c:28 (RET CROSS_2M)
                  __random random.c:298 (cycles:1)
                  __random random.c:297 (COND_BWD CROSS_2M)
                  __random random.c:295 (cycles:1)
                  __random random.c:295 (COND_BWD CROSS_2M)
                  __random random.c:295 (cycles:1)
                  __random random.c:295 (RET CROSS_2M)
                  |
                   --2.23%--__random_r random_r.c:392 (cycles:9)
      
      In this example, The "main div.c:39 (COND_BWD CROSS_2M predicted:49.7%)"
      is "to" of branch and "main div.c:44 (cycles:1)" is "from" of branch.
      It should be easier for understanding than before.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500894547-18411-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a1a8bed3
    • J
      perf report: Make --branch-history work without callgraphs(-g) option in perf record · b49a821e
      Jin Yao 提交于
        perf record -b -g <command>
        perf report --branch-history
      
      This merges the LBRs with the callgraphs.
      
      However it would be nice if it also works without callgraphs (-g) set in
      perf record, so that only the LBRs are displayed.  But currently perf
      report errors in this case. For example,
      
        perf record -b <command>
        perf report --branch-history
      
        Error:
        Selected -g or --branch-history but no callchain data. Did
        you call 'perf record' without -g?
      
      This patch displays the LBRs only even if callgraphs(-g) is not enabled
      in perf record.
      
      Change log:
      
      v2: According to Milian Wolff's comment, change the obsolete error
      message. Now the error message is:
      
                       ┌─Error:─────────────────────────────────────┐
                       │Selected -g or --branch-history.            │
                       │But no callchain or branch data.            │
                       │Did you call 'perf record' without -g or -b?│
                       │                                            │
                       │                                            │
                       │Press any key...                            │
                       └────────────────────────────────────────────┘
      
      When passing the last parameter to hists__fprintf,
      changes "|" to "||".
      
        hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
                       symbol_conf.use_callchain || symbol_conf.show_branchflag_count);
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b49a821e
  6. 21 7月, 2017 1 次提交
  7. 19 7月, 2017 2 次提交
    • J
      perf report: Show branch type in callchain entry · b851dd49
      Jin Yao 提交于
      Show branch type in callchain entry. The branch type is printed
      with other LBR information (such as cycles/abort/...).
      
      For example:
      
        perf record -g -j any,save_type
        perf report --branch-history --stdio --no-children
      
        38.50%  div.c:45                [.] main                    div
                |
                ---main div.c:42 (RET CROSS_2M cycles:2)
                   compute_flag div.c:28 (cycles:2)
                   compute_flag div.c:27 (RET CROSS_2M cycles:1)
                   rand rand.c:28 (cycles:1)
                   rand rand.c:28 (RET CROSS_2M cycles:1)
                   __random random.c:298 (cycles:1)
                   __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (RET CROSS_2M cycles:9)
      
      Change log
      
      v6: Remove the branch_type_str() since it's moved to branch.c.
      
      v5: Rewrite the branch info print code in util/callchain.c.
      
      v4: Comparing to previous version, the major changes are:
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500379995-6449-8-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b851dd49
    • J
      perf report: Refactor the branch info printing code · 8d51735f
      Jin Yao 提交于
      The branch info such as predicted/cycles/... are printed at the
      callchain entries.
      
      For example: perf report --branch-history --no-children --stdio
      
          --1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
                    main div.c:44 (predicted:52.4% cycles:1)
                    main div.c:42 (cycles:2)
                    compute_flag div.c:28 (cycles:2)
                    compute_flag div.c:27 (cycles:1)
                    rand rand.c:28 (cycles:1)
                    rand rand.c:28 (cycles:1)
                    __random random.c:298 (cycles:1)
                    __random random.c:297 (cycles:1)
                    __random random.c:295 (cycles:1)
                    __random random.c:295 (cycles:1)
                    __random random.c:295 (cycles:1)
      
      But the current code is difficult to maintain and extend. This patch
      refactors the code for easy maintenance.
      
      Change log:
      
      v6: 1. Put the multiline condition code into {} brackets in
             counts_str_build()
      
          2. Keep the original display order, that is:
             predicted, abort, cycles, iterations
      
      v5: It's a new patch in v5 patch series.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500379995-6449-5-git-send-email-yao.jin@linux.intel.com
      [ Don't use 'index' as a name for a variable, it shadows a globa decl in older distros ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8d51735f
  8. 24 5月, 2017 1 次提交
    • M
      perf report: Don't crash on invalid maps in `-g srcline` mode · 7d4df089
      Milian Wolff 提交于
      I just hit a segfault when doing `perf report -g srcline`.
      Valgrind pointed me at this code as the culprit:
      
        ==8359== Invalid read of size 8
        ==8359==    at 0x3096D9: map__rip_2objdump (map.c:430)
        ==8359==    by 0x2FC1A3: match_chain_srcline (callchain.c:645)
        ==8359==    by 0x2FC1A3: match_chain (callchain.c:700)
        ==8359==    by 0x2FC1A3: append_chain (callchain.c:895)
        ==8359==    by 0x2FC1A3: append_chain_children (callchain.c:846)
        ==8359==    by 0x2FF719: callchain_append (callchain.c:944)
        ==8359==    by 0x2FF719: hist_entry__append_callchain (callchain.c:1058)
        ==8359==    by 0x32FA06: iter_add_single_cumulative_entry (hist.c:908)
        ==8359==    by 0x33195C: hist_entry_iter__add (hist.c:1050)
        ==8359==    by 0x258F65: process_sample_event (builtin-report.c:204)
        ==8359==    by 0x30D60C: perf_session__deliver_event (session.c:1310)
        ==8359==    by 0x30D60C: ordered_events__deliver_event (session.c:119)
        ==8359==    by 0x310D12: __ordered_events__flush (ordered-events.c:210)
        ==8359==    by 0x310D12: ordered_events__flush.part.3 (ordered-events.c:277)
        ==8359==    by 0x30DD3C: perf_session__process_user_event (session.c:1349)
        ==8359==    by 0x30DD3C: perf_session__process_event (session.c:1475)
        ==8359==    by 0x30FC3C: __perf_session__process_events (session.c:1867)
        ==8359==    by 0x30FC3C: perf_session__process_events (session.c:1921)
        ==8359==    by 0x25A985: __cmd_report (builtin-report.c:575)
        ==8359==    by 0x25A985: cmd_report (builtin-report.c:1054)
        ==8359==    by 0x2B9A80: run_builtin (perf.c:296)
        ==8359==  Address 0x70 is not stack'd, malloc'd or (recently) free'd
      
      This patch fixes the issue.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      [ Remove dependency from another change ]
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-2-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7d4df089
  9. 25 4月, 2017 1 次提交
  10. 20 4月, 2017 1 次提交
  11. 11 4月, 2017 1 次提交
  12. 29 3月, 2017 1 次提交
    • J
      perf report: Drop cycles 0 for LBR print · c1dfcfad
      Jin Yao 提交于
      For some platforms, for example Broadwell, it doesn't support cycles
      for LBR. But the perf always prints cycles:0, it's not necessary.
      
      The patch refactors the LBR info print code and drops the cycles:0.
      
      For example: perf report --branch-history --no-children --stdio
      
      On Broadwell:
      --0.91%--__random_r random_r.c:394 (iterations:2)
                __random_r random_r.c:360 (predicted:0.0%)
                __random_r random_r.c:380 (predicted:0.0%)
                __random_r random_r.c:357
      
      On Skylake:
      --1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
                main div.c:44 (predicted:52.4% cycles:1)
                main div.c:42 (cycles:2)
                compute_flag div.c:28 (cycles:2)
                compute_flag div.c:27 (cycles:1)
                rand rand.c:28 (cycles:1)
                rand rand.c:28 (cycles:1)
                __random random.c:298 (cycles:1)
                __random random.c:297 (cycles:1)
                __random random.c:295 (cycles:1)
                __random random.c:295 (cycles:1)
                __random random.c:295 (cycles:1)
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1489046786-10061-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1dfcfad
  13. 27 3月, 2017 1 次提交
    • M
      perf report: Enable sorting by srcline as key · 5dfa210e
      Milian Wolff 提交于
      Often it is interesting to know how costly a given source line is in
      total. Previously, one had to build these sums manually based on all
      addresses that pointed to the same source line. This patch introduces
      srcline as a sort key, which will do the aggregation for us.
      
      Paired with the recent addition of showing inline frames, this makes
      perf report much more useful for many C++ work loads.
      
      The following shows the new feature in action. First, let's show the
      status quo output when we sort by address. The result contains many hist
      entries that generate the same output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g address
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--60.31%--hypot +20
                  |          |          |
                  |          |          |--8.52%--__hypot_finite +273
                  |          |          |
                  |          |          |--7.32%--__hypot_finite +411
      ...
                   --35.34%--_start +4194346
                             __libc_start_main +241
                             |
                             |--6.65%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--2.70%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--1.69%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      
      With this patch and `-g srcline` we instead get the following output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g srcline
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--64.02%--hypot
                  |          |          |
                  |          |           --59.81%--__hypot_finite
                  |          |
                  |           --0.53%--cabs
                  |
                   --35.34%--_start
                             __libc_start_main
                             |
                             |--12.48%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5dfa210e
  14. 02 2月, 2017 1 次提交
  15. 01 2月, 2017 1 次提交
  16. 27 1月, 2017 1 次提交
    • A
      perf tools: Propagate perf_config() errors · ecc4c561
      Arnaldo Carvalho de Melo 提交于
      Previously these were being ignored, sometimes silently.
      
      Stop doing that, emitting debug messages and handling the errors.
      
      Testing it:
      
        $ cat ~/.perfconfig
        cat: /home/acme/.perfconfig: No such file or directory
        $ perf stat -e cycles usleep 1
      
         Performance counter stats for 'usleep 1':
      
                 938,996      cycles:u
      
             0.003813731 seconds time elapsed
      
        $ perf top --stdio
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        <SNIP>
        [ perf record: Captured and wrote 0.019 MB perf.data (7 samples) ]
        [acme@jouet linux]$ perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        # Overhead  Command  Shared Object      Symbol
        # ........  .......  .................  .........................
          71.77%  usleep   libc-2.24.so       [.] _dl_addr
          27.07%  usleep   ld-2.24.so         [.] _dl_next_ld_env_entry
           1.13%  usleep   [kernel.kallsyms]  [k] page_fault
        $
        $ touch ~/.perfconfig
        $ ls -la ~/.perfconfig
        -rw-rw-r--. 1 acme acme 0 Jan 27 12:14 /home/acme/.perfconfig
        $
        $ perf stat -e instructions usleep 1
      
         Performance counter stats for 'usleep 1':
      
                 244,610      instructions:u
      
             0.000805383 seconds time elapsed
      
        $
        [root@jouet ~]# chown acme.acme ~/.perfconfig
        [root@jouet ~]# perf stat -e cycles usleep 1
          Warning: File /root/.perfconfig not owned by current user or root, ignoring it.
      
         Performance counter stats for 'usleep 1':
      
                 937,615      cycles
      
             0.000836931 seconds time elapsed
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-j2rq96so6xdqlr8p8rd6a3jx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ecc4c561
  17. 07 12月, 2016 1 次提交
  18. 15 11月, 2016 2 次提交
    • J
      perf report: Calculate and return the branch flag counting · 3dd029ef
      Jin Yao 提交于
      Create some branch counters in per callchain list entry. Each counter
      is for a branch flag. For example, predicted_count counts all the
      *predicted* branches. The counters get updated by processing the
      callchain cursor nodes.
      
      It also provides functions to retrieve or print the values of counters
      in callchain list.
      
      Besides the counting for branch flags, it also counts and returns the
      average number of iterations.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linux-kernel@vger.kernel.org
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/1477876794-30749-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3dd029ef
    • J
      perf report: Add branch flag to callchain cursor node · 410024db
      Jin Yao 提交于
      Since the branch ip has been added to call stack for easier browsing,
      this patch adds more branch information. For example, add a flag to
      indicate if this ip is a branch, and also add with the branch flag.
      
      Then we can know if the cursor node represents a branch and know what
      the branch flag it has.
      
      The branch history code has a loop detection pass that removes loops. It
      would be nice for knowing how many loops were removed then in next
      steps, we can compute out the average number of iterations.
      
      For example:
      
      Before remove_loops(),
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x300, to = 0x250
      entry3: from = 0x300, to = 0x250
      entry4: from = 0x700, to = 0x800
      
      After remove_loops()
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x700, to = 0x800
      
      The original entry2 and entry3 are removed. So the number of iterations
      (from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
      
      iterations = removed number + 1;
      average iteractions = Sum(iteractions) / number of samples
      
      This formula ignores other cases, for example, iterations cross multiple
      buffers and one buffer contains 2+ loops. Because in practice, it's good
      enough.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linux-kernel@vger.kernel.org
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
      [ Renamed 'iter' to 'nr_loop_iter' for clarity ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      410024db
  19. 08 11月, 2016 1 次提交
  20. 06 5月, 2016 1 次提交
  21. 18 4月, 2016 1 次提交
  22. 15 4月, 2016 1 次提交
    • A
      perf callchain: Start moving away from global per thread cursors · 91d7b2de
      Arnaldo Carvalho de Melo 提交于
      The recent perf_evsel__fprintf_callchain() move to evsel.c added several
      new symbol requirements to the python binding, for instance:
      
        # perf test -v python
        16: Try 'import perf' in python, checking link problems      :
        --- start ---
        test child forked, pid 18030
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
        callchain_cursor
        test child finished with -1
        ---- end ----
        Try 'import perf' in python, checking link problems: FAILED!
        #
      
      This would require linking against callchain.c to access to the global
      callchain_cursor variables.
      
      Since lots of functions already receive as a parameter a
      callchain_cursor struct pointer, make that be the case for some more
      function so that we can start phasing out usage of yet another global
      variable.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-djko3097eyg2rn66v2qcqfvn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91d7b2de
  23. 20 2月, 2016 5 次提交
  24. 08 1月, 2016 1 次提交
  25. 27 11月, 2015 1 次提交
  26. 20 11月, 2015 5 次提交