1. 29 11月, 2017 6 次提交
    • A
      perf evlist: Add helper to check if attr.exclude_kernel is set in all evsels · 5b0d1cb4
      Arnaldo Carvalho de Melo 提交于
      The warning about kptr_restrict needs to be emitted only when it is set
      and we ask for kernel space samples, so add a helper to help with that.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-fh7drty6yljei9gxxzer6eup@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5b0d1cb4
    • R
      perf annotate: Do not truncate instruction names at 6 chars · 05d0e62d
      Ravi Bangoria 提交于
      There are many instructions, esp on PowerPC, whose mnemonics are longer
      than 6 characters. Using precision limit causes truncation of such
      mnemonics.
      
      Fix this by removing precision limit. Note that, 'width' is still 6, so
      alignment won't get affected for length <= 6.
      
      Before:
      
         li     r11,-1
         xscvdp vs1,vs1
         add.   r10,r10,r11
      
      After:
      
        li     r11,-1
        xscvdpsxds vs1,vs1
        add.   r10,r10,r11
      Reported-by: NDonald Stence <dstence@us.ibm.com>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/20171114032540.4564-1-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05d0e62d
    • A
      perf machine: Guard against NULL in machine__exit() · 4a2233b1
      Arnaldo Carvalho de Melo 提交于
      A recent fix for 'perf trace' introduced a bug where
      machine__exit(trace->host) could be called while trace->host was still
      NULL, so make this more robust by guarding against NULL, just like
      free() does.
      
      The problem happens, for instance, when !root users try to run 'perf
      trace':
      
        [acme@jouet linux]$ trace
        Error:	No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
        Hint:	Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'
      
        perf: Segmentation fault
        Obtained 7 stack frames.
        [0x4f1b2e]
        /lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f]
        [0x4f3fec]
        [0x47468b]
        [0x42a2db]
        /lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509]
        [0x42a6c9]
        Segmentation fault (core dumped)
        [acme@jouet linux]$
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 33974a41 ("perf trace: Call machine__exit() at exit")
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4a2233b1
    • A
      perf evsel: Fix up leftover perf_evsel_stat usage via evsel->priv · 8e2d8e20
      Arnaldo Carvalho de Melo 提交于
      I forgot one conversion, which got noticed by Thomas when running:
      
        $ perf stat  -e '{cpu-clock,instructions}' kill
        kill: not enough arguments
        Segmentation fault (core dumped)
        $
      
      Fix it, those stats are in evsel->stats, not anymore in evsel->priv.
      Reported-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: e669e833 ("perf evsel: Restore evsel->priv as a tool private area")
      Link: http://lkml.kernel.org/r/20171109150046.GN4333@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8e2d8e20
    • A
      perf record: Fix -c/-F options for cpu event aliases · 59622fd4
      Andi Kleen 提交于
      The Intel PMU event aliases have a implicit period= specifier to set the
      default period.
      
      Unfortunately this breaks overriding these periods with -c or -F,
      because the alias terms look like they are user specified to the
      internal parser, and user specified event qualifiers override the
      command line options.
      
      Track that they are coming from aliases by adding a "weak" state to the
      term. Any weak terms don't override command line options.
      
      I only did it for -c/-F for now, I think that's the only case that's
      broken currently.
      
      Before:
      
      $ perf record -c 1000 -vv -e uops_issued.any
      ...
        { sample_period, sample_freq }   2000003
      
      After:
      
      $ perf record -c 1000 -vv -e uops_issued.any
      ...
        { sample_period, sample_freq }   1000
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20171020202755.21410-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      59622fd4
    • A
      perf evlist: Set the correct idx when adding dummy events · 555b4ec4
      Arnaldo Carvalho de Melo 提交于
      The evsel->idx field is used mainly to access the right bucket in
      per-event arrays such as the annotation ones, but also to set
      evsel->tracking, that in turn will decide what of the events will ask
      for PERF_RECORD_{MMAP,COMM,EXEC} to be generated, i.e. which
      perf_event_attr will have its mmap, etc fields set.
      
      When we were adding the "dummy" event using perf_evlist__add_dummy() we
      were not setting it correctly, which could result in multiple tracking
      events.
      
      Now that I'll try using a dummy event to be the tracking one when using
      'perf record --delay', i.e. when we process the --delay
      setting we may already have the evlist set up, like with:
      
        perf record -e cycles,instructions --delay 1000 ./workload
      
      We will need to add a "dummy" event, then reset evsel->tracking for the
      first event, "cycles", and set it instead to the dummy one, and also
      setting its attr.enable_on_exec, so that we get the PERF_RECORD_MMAP,
      etc metadata events while waiting to enable the explicitely requested
      events, so lets get this straight and set the right evsel->idx.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Bram Stolk <b.stolk@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      555b4ec4
  2. 09 11月, 2017 2 次提交
    • J
      perf tools: Fix eBPF event specification parsing · a271bfaf
      Jiri Olsa 提交于
      Looks like I've reached the new level of stupidity, adding missing braces.
      
      Committer testing:
      
      Given the following eBPF C filter, that will add a record when it
      returns true, i.e. when the tv_nsec variable is > 2000ns, should be
      built and installed via sys_bpf(), but fails to do so before this patch:
      
        # cat filter.c
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
      
        SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
        int func(void *ctx, int err, long nsec)
        {
      	  return nsec > 1000;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        #
      
        # perf trace -e nanosleep,filter.c usleep 1
        invalid or unsupported event: 'filter.c'
        Run 'perf list' for a list of valid events
      
         Usage: perf trace [<options>] [<command>]
            or: perf trace [<options>] -- <command> [<options>]
            or: perf trace record [<options>] [<command>]
            or: perf trace record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event/syscall selector. use 'perf list' to list available events
        #
      
      And works again after it is applied, the nothing is inserted when the co
      
        # perf trace -e *sleep,filter.c usleep 1
           0.000 ( 0.066 ms): usleep/23994 nanosleep(rqtp: 0x7ffead94a0d0) = 0
        # perf trace -e *sleep,filter.c usleep 2
           0.000 ( 0.008 ms): usleep/24378 nanosleep(rqtp: 0x7fffa021ba50) ...
           0.008 (         ): perf_bpf_probe:func:(ffffffffb410cb30) tv_nsec=2000)
           0.000 ( 0.066 ms): usleep/24378  ... [continued]: nanosleep()) = 0
        #
      
      The intent of 9445464b is kept:
      
        # perf stat -e 'cpu/uops_executed.core,krava/'  true
        event syntax error: '..cuted.core,krava/'
                                          \___ unknown term
      
        valid terms: cmask,pc,event,edge,in_tx,any,ldlat,inv,umask,in_tx_cp,offcore_rsp,config,config1,config2,name,period
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        #
        # perf stat -e 'cpu/uops_executed.core,period=1/'  true
      
         Performance counter stats for 'true':
      
                 808,332      cpu/uops_executed.core,period=1/
      
             0.002997237 seconds time elapsed
      
        #
      Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 9445464b ("perf tools: Unwind properly location after REJECT")
      Link: http://lkml.kernel.org/n/tip-diea0ihbwpxfw6938huv3whj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a271bfaf
    • J
      perf tools: Add "reject" option for parse-events.l · b6af53b7
      Jiri Olsa 提交于
      Arnaldo reported broken builds in some distros using a newer flex
      release, 2.6.4, found in Alpine Linux 3.6 and Edge, with flex not
      spotting the REJECT macro:
      
        CC       /tmp/build/perf/util/parse-events-flex.o
        util/parse-events.l: In function 'parse_events_lex':
        /tmp/build/perf/util/parse-events-flex.c:4734:16: error: \
        'reject_used_but_not_detected' undeclared (first use in this function)
      
      It's happening because we put the REJECT under another USER_REJECT macro
      in following commit:
      
        9445464b perf tools: Unwind properly location after REJECT
      
      Fortunately flex provides option for force it to use REJECT, adding it
      to parse-events.l.
      Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 9445464b ("perf tools: Unwind properly location after REJECT")
      Link: http://lkml.kernel.org/n/tip-7kdont984mw12ijk7rji6b8p@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b6af53b7
  3. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  4. 01 11月, 2017 3 次提交
  5. 31 10月, 2017 5 次提交
  6. 27 10月, 2017 2 次提交
  7. 26 10月, 2017 1 次提交
    • R
      perf symbols: Fix memory corruption because of zero length symbols · 331c7cb3
      Ravi Bangoria 提交于
      Perf top is often crashing at very random locations on powerpc.  After
      investigating, I found the crash only happens when sample is of zero
      length symbol. Powerpc kernel has many such symbols which does not
      contain length details in vmlinux binary and thus start and end
      addresses of such symbols are same.
      
      Structure
      
        struct sym_hist {
              u64                   nr_samples;
              u64                   period;
              struct sym_hist_entry addr[0];
        };
      
      has last member 'addr[]' of size zero. 'addr[]' is an array of addresses
      that belongs to one symbol (function). If function consist of 100
      instructions, 'addr' points to an array of 100 'struct sym_hist_entry'
      elements. For zero length symbol, it points to the *empty* array, i.e.
      no members in the array and thus offset 0 is also invalid for such
      array.
      
        static int __symbol__inc_addr_samples(...)
        {
              ...
              offset = addr - sym->start;
              h = annotation__histogram(notes, evidx);
              h->nr_samples++;
              h->addr[offset].nr_samples++;
              h->period += sample->period;
              h->addr[offset].period += sample->period;
              ...
        }
      
      Here, when 'addr' is same as 'sym->start', 'offset' becomes 0, which is
      valid for normal symbols but *invalid* for zero length symbols and thus
      updating h->addr[offset] causes memory corruption.
      
      Fix this by adding one dummy element for zero length symbols.
      
      Link: https://lkml.org/lkml/2016/10/10/148
      Fixes: edee44be ("perf annotate: Don't throw error for zero length symbols")
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/1508854806-10542-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      331c7cb3
  8. 25 10月, 2017 6 次提交
    • M
      perf util: Enable handling of inlined frames by default · d8a88dd2
      Milian Wolff 提交于
      Now that we have caches in place to speed up the process of finding
      inlined frames and srcline information repeatedly, we can enable this
      useful option by default.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8a88dd2
    • M
      perf report: Use srcline from callchain for hist entries · 1fb7d06a
      Milian Wolff 提交于
      This also removes the symbol name from the srcline column, more on this
      below.
      
      This ensures we use the correct srcline, which could originate from a
      potentially inlined function. The hist entries used to query for the
      srcline based purely on the IP, which leads to wrong results for inlined
      entries.
      
      Before:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ..................................................................................................................................
        #
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            44.58%     0.00%  main+100
            44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  std::__complex_abs+100
            44.58%     0.00%  std::abs<double>+100
            44.58%     0.00%  std::norm<double>+100
            36.01%     0.00%  hypot+18446603487892193300
            25.81%     0.00%  main+41
            25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.75%    25.75%  random.h:143
            18.39%     0.00%  main+57
            18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  random.tcc:3330
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ...........................................
        #
            94.30%     1.19%  main.cpp:39
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            48.44%     1.70%  random.h:1823
            48.44%     0.00%  random.h:1814
            46.74%     2.53%  random.h:185
            44.68%     0.10%  complex:589
            44.68%     0.00%  complex:597
            44.68%     0.00%  complex:654
            44.68%     0.00%  complex:664
            40.61%    13.80%  random.tcc:3330
            36.01%     0.00%  hypot+18446603487892193300
            26.81%     0.00%  random.h:151
            26.81%     0.00%  random.h:332
            25.75%    25.75%  random.h:143
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Note that this change removes the symbol from the source:line hist
      column. If this information is desired, users should explicitly query
      for it if needed. I.e. run this command instead:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ...........................................
        #
            94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
            46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
            44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
            44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
            44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
            44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
            39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
            26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
            25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
            25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Compared to the old behavior, this reduces duplication in the output.
      Before we used to print the symbol name in the srcline column even
      when the sym column was explicitly requested. I.e. the output was:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ..................................................................................................................................
        #
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            44.58%     0.00%  [.] main                                                                                                                             main+100
            44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
            44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
            44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            25.81%     0.00%  [.] main                                                                                                                             main+41
            25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
            18.39%     0.00%  [.] main                                                                                                                             main+57
            18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fb7d06a
    • M
      perf report: Cache srclines for callchain nodes · 21ac9d54
      Milian Wolff 提交于
      On one hand this ensures that the memory is properly freed when the DSO
      gets freed. On the other hand this significantly speeds up the
      processing of the callchain nodes when lots of srclines are requested.
      For one of my data files e.g.:
      
      Before:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            52496.495043      task-clock (msec)         #    0.999 CPUs utilized
                     634      context-switches          #    0.012 K/sec
                       2      cpu-migrations            #    0.000 K/sec
                 191,561      page-faults               #    0.004 M/sec
         165,074,498,235      cycles                    #    3.144 GHz
         334,170,832,408      instructions              #    2.02  insn per cycle
          90,220,029,745      branches                  # 1718.591 M/sec
             654,525,177      branch-misses             #    0.73% of all branches
      
            52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
      
      After:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                      31      context-switches          #    0.001 K/sec
                       0      cpu-migrations            #    0.000 K/sec
                 185,471      page-faults               #    0.008 M/sec
          71,188,113,681      cycles                    #    3.149 GHz
         133,204,943,083      instructions              #    1.87  insn per cycle
          34,886,384,979      branches                  # 1543.214 M/sec
             278,214,495      branch-misses             #    0.80% of all branches
      
            22.609857253 seconds time elapsed
      
      Note that the difference is only this large when `--inline` is not
      passed. In such situations, we would use the inliner cache and thus do
      not run this code path that often.
      
      I think that this cache should actually be used in other places, too.
      When looking at the valgrind leak report for perf report, we see tons of
      srclines being leaked, most notably from calls to
      hist_entry__get_srcline. The problem is that get_srcline has many
      different formatting options (show_sym, show_addr, potentially even
      unwind_inlines when calling __get_srcline directly). As such, the
      srcline cannot easily be cached for all calls, or we'd have to add
      caches for all formatting combinations (6 so far). An alternative would
      be to remove the formatting options and handle that on a different level
      - i.e. print the sym/addr on demand wherever we actually output
      something. And the unwind_inlines could be moved into a separate
      function that does not return the srcline.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      21ac9d54
    • M
      perf report: Cache failed lookups of inlined frames · b38775cf
      Milian Wolff 提交于
      When no inlined frames could be found for a given address, we did not
      store this information anywhere. That means we potentially do the costly
      inliner lookup repeatedly for cases where we know it can never succeed.
      
      This patch makes dso__parse_addr_inlines always return a valid
      inline_node. It will be empty when no inliners are found. This enables
      us to cache the empty list in the DSO, thereby improving the performance
      when many addresses fail to find the inliners.
      
      For my trivial example, the performance impact is already quite
      significant:
      
      Before:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                      53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                       0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
                   5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
           2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
           4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
             939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
              11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )
      
             0.596246531 seconds time elapsed                                          ( +-  0.07% )
      ~~~~~
      
      After:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                      29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                       0      cpu-migrations            #    0.000 K/sec
                   5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
             432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
             670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
             141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
               2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )
      
             0.114222393 seconds time elapsed                                          ( +-  1.19% )
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b38775cf
    • M
      perf report: Properly handle branch count in match_chain() · bf36eb5c
      Milian Wolff 提交于
      Some of the code paths I introduced before returned too early without
      running the code to handle a node's branch count.  By refactoring
      match_chain to only have one exit point, this can be remedied.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
      Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bf36eb5c
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  9. 24 10月, 2017 12 次提交
    • M
      perf report: Compare symbol name for inlined frames when sorting · aa441895
      Milian Wolff 提交于
      Similar to the callstack frame matching, we also have to compare the
      symbol name when sorting hist entries. The reason is twofold: On one
      hand, multiple inlined functions will use the same symbol start/end
      values of the parent, non-inlined symbol.
      
      As such, all of these symbols often end up missing from top-level
      report, as they get merged with the non-inlined frame. On the other
      hand, multiple different functions may end up inlining the same
      function, and we need to aggregate these values properly.
      
      Before:
      
      ~~~~~
        perf report --stdio --inline -g none
        # Children     Self  Command       Shared Object Symbol
        # ........ ........  ............  ............. ...................................
        #
           100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
           100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
           100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
            97.03%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
            59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
            55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
             0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --stdio --inline -g none
        # Children     Self  Command       Shared Object Symbol
        # ........ ........  ............  ............. ...................................................................................................................................
        #
           100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
           100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
           100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::__complex_abs (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::abs<double> (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
            59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
            55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
            34.46%    0.00%  cpp-inlining  cpp-inlining  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
            32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
            32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
             0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa441895
    • M
      perf callchain: Compare symbol name for inlined frames when matching · 9856240a
      Milian Wolff 提交于
      The fake symbols we create for inlined frames will represent different
      functions but can use the symbol start address. This leads to issues
      when different inline branches all lead to the same function.
      
      Before:
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                              --37.57%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        |
                                         --36.36%--std::abs<double> (inlined)
                                                   std::__complex_abs (inlined)
                                                   |
                                                    --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                                                              std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                                                              std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
      ~~~~~
      
      Note that this backtrace representation is completely bogus.
      Complex abs does not call the linear congruential engine! It
      is just a side-effect of a longer inlined stack being appended
      to a shorter, different inlined stack, both of which originate
      in the same function (main).
      
      This patch fixes the issue:
      
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                             |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          |
                             |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
                             |                     std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |                     |
                             |                      --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                             |                                std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                             |                                std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
                             |
                              --1.99%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        std::abs<double> (inlined)
                                        std::__complex_abs (inlined)
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      [ Fix up conflict with c1fbc0cf ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9856240a
    • M
      perf script: Mark inlined frames and do not print DSO for them · 9628b56d
      Milian Wolff 提交于
      Instead of showing the (repeated) DSO name of the non-inlined frame, we
      now show the "(inlined)" suffix instead.
      
      Before:
                         214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                          ace3 hypot (/usr/lib/libm-2.25.so)
                           a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a main (/home/milian/projects/src/perf-tests/inlining)
                         20510 __libc_start_main (/usr/lib/libc-2.25.so)
                           bd9 _start (/home/milian/projects/src/perf-tests/inlining)
      
      After:
                         214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                          ace3 hypot (/usr/lib/libm-2.25.so)
                           a4a std::__complex_abs (inlined)
                           a4a std::abs<double> (inlined)
                           a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                           a4a std::norm<double> (inlined)
                           a4a main (/home/milian/projects/src/perf-tests/inlining)
                         20510 __libc_start_main (/usr/lib/libc-2.25.so)
                           bd9 _start (/home/milian/projects/src/perf-tests/inlining)
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9628b56d
    • M
      perf callchain: Mark inlined frames in output by " (inlined)" suffix · 8932f807
      Milian Wolff 提交于
      The original patch that introduced inline frame output in the various
      browsers used this suffix already. The new centralized approach that
      uses fake symbols for inlined frames was missing this approach so far.
      
      Instead of changing the symbol name itself, we only print the suffix
      where needed. This allows us to efficiently lookup the symbol for a
      given name without first having to append the suffix before the lookup.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8932f807
    • M
      perf report: Fall-back to function name comparison for -g srcline · cbe50f61
      Milian Wolff 提交于
      When a callchain entry has no srcline available, we ended up comparing
      the instruction pointer. I consider this to be not too useful. Rather, I
      think we should group the entries by function name, which this patch
      adds. For people who want to split the data on the IP boundary, using
      `-g address` is the correct choice.
      
      Before:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--56.03%--hypot
                  |          |          |
                  |          |          |--8.45%--__hypot_finite
                  |          |          |
                  |          |          |--7.62%--__hypot_finite
                  |          |          |
                  |          |          |--2.29%--__hypot_finite
                  |          |          |
                  |          |          |--2.24%--__hypot_finite
                  |          |          |
                  |          |          |--2.06%--__hypot_finite
                  |          |          |
                  |          |          |--1.81%--__hypot_finite
      ...
      ~~~~~
      
      After:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--60.29%--hypot
                  |          |          |
                  |          |           --56.03%--__hypot_finite
                  |          |
                  |           --0.85%--cabs
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cbe50f61
    • M
      perf callchain: Create real callchain entries for inlined frames · 11ea2515
      Milian Wolff 提交于
      The inline_node structs are maintained by the new dso->inlines tree.
      This in turn keeps ownership of the fake symbols and srcline string
      representing an inline frame.
      
      This tree is sorted by address to allow quick lookups. All other entries
      of the symbol beside the function name are unused for inline frames. The
      advantage of this approach is that all existing users of the callchain
      API can now transparently display inlined frames without having to patch
      their code.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      11ea2515
    • M
      perf callchain: Refactor inline_list to store srcline string directly · 2be8832f
      Milian Wolff 提交于
      This is a preparation for the creation of real callchain entries for
      inlined frames. The rest of the perf code uses the srcline string. As
      such, using that also for the srcline API allows us to simplify some of
      the upcoming code. Most notably, it will allow us to cache the srcline
      for a given inline node and reuse it for different callchain entries.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2be8832f
    • M
      perf callchain: Refactor inline_list to operate on symbols · fea0cf84
      Milian Wolff 提交于
      This is a requirement to create real callchain entries for inlined
      frames.
      
      Since the list of inlines usually contains the target symbol too, i.e.
      the location where the frames get inlined to, we alias that symbol and
      reuse it as-is is. This ensures that other dependent functionality keeps
      working, most notably annotation of the target frames.
      
      For all other entries in the inline_list, a fake symbol is created.
      These are marked by new 'inlined' member which is set to true. Only
      those symbols are managed by the inline_list and get freed when the
      inline_list is deleted from within inline_node__delete.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fea0cf84
    • M
      perf callchain: Store srcline in callchain_cursor_node · 40a342cd
      Milian Wolff 提交于
      This is mostly a preparation to enable the creation of full callchain
      nodes for inline frames. Such frames will reference the IP of the
      non-inlined frame, but hold the symbol and srcline for an inlined
      location. As such, we won't be able to query the srcline on-demand based
      on the IP alone. Instead, we will leverage the functionality provided by
      this patch here, and store the srcline for the inlined nodes in the new
      srcline member of callchain_cursor_node.
      
      Note that this patch on its own leaks the srcline, as there is no
      free_callchain_cursor_node or similar. A future patch will add caching
      of the srcline and handle deletion properly.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      40a342cd
    • M
      perf report: Remove code to handle inline frames from browsers · 2a704fc8
      Milian Wolff 提交于
      The follow-up commits will make inline frames first-class citizens in
      the callchain, thereby obsoleting all of this special code.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a704fc8
    • A
      perf namespaces: Add more appropriate set of headers · b1f03ca4
      Arnaldo Carvalho de Melo 提交于
      We don't need perf.h, that is a kitchen sink, all we need is
      perf_events.h for perf_ns_link_info, sys/types.h for pid_t and
      linux/types.h for u64, list_head.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-f2uxyaj4s2hmntkrezpa6dqz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b1f03ca4
    • A
      perf tools: Introduce binary__fprintf() · 923d0c9a
      Arnaldo Carvalho de Melo 提交于
      Out of print_binary() but receiving a fp pointer and expecting that the
      printer be a fprintf like function, i.e. receive a FILE pointer and
      return the number of characters printed.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: http://lkml.kernel.org/n/tip-6oqnxr6lmgqe6q6p3iugnscx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      923d0c9a
  10. 23 10月, 2017 2 次提交