1. 13 11月, 2017 3 次提交
    • J
      perf annotate: Add annotation_line struct · a17c4ca0
      Jiri Olsa 提交于
      In order to make the annotation support generic, addadding 'struct
      annotation_line', which will hold generic data common to annotation
      sources (such as the one for python scripts, coming on upcoming
      patches).
      
      Having this, we can add different annotation line support other than
      objdump disasm.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171011150158.11895-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a17c4ca0
    • A
      perf record: Generate PERF_RECORD_{MMAP,COMM,EXEC} with --delay · d3dbf43c
      Arnaldo Carvalho de Melo 提交于
      When we use an initial delay, e.g.: 'perf record --delay 1000', we do not
      enable the events until that delay has passed after we started the workload,
      including the tracking event, i.e. the one for which we have attr.mmap, etc,
      enabled to ask the kernel to generate the PERF_RECORD_{MMAP,COMM,EXEC} metadata
      events that will then allow us to resolve addresses in samples to the map, dso
      and symbol. There will be a shadow that even synthesizing samples won't cover,
      i.e. the workload that we start and other processes forking while we
      wait for the initial delay to expire.
      
      So use a dummy event to be the tracking one and make it be enabled on exec.
      
      Before:
      
        # perf record --delay 1000 stress --cpu 1 --timeout 5
        stress: info: [9029] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
        stress: info: [9029] successful run completed in 5s
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 0.624 MB perf.data (15908 samples) ]
        # perf script | head
            :9031 9031 32001.826888:       1 cycles:ppp: ffffffff831aa30d event_function (/lib/modules/4.14.0-rc6+/build/vmlinux)
            :9031 9031 32001.826893:       1 cycles:ppp: ffffffff8300d1a0 intel_bts_enable_local (/lib/modules/4.14.0-rc6+/build/vmlinux)
            :9031 9031 32001.826895:       7 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
            :9031 9031 32001.826897:     103 cycles:ppp: ffffffff8300c331 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
            :9031 9031 32001.826899:    1615 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
            :9031 9031 32001.826902:   26724 cycles:ppp: ffffffff8384c6a7 native_irq_return_iret (/lib/modules/4.14.0-rc6+/build/vmlinux)
            :9031 9031 32001.826913:  329739 cycles:ppp:     7fb2a5410932 [unknown] ([unknown])
            :9031 9031 32001.827033: 1225451 cycles:ppp:     7fb2a5410930 [unknown] ([unknown])
            :9031 9031 32001.827474: 1391725 cycles:ppp:     7fb2a5410930 [unknown] ([unknown])
            :9031 9031 32001.827978: 1233697 cycles:ppp:     7fb2a5410928 [unknown] ([unknown])
        #
      
      After:
      
        # perf record --delay 1000 stress --cpu 1 --timeout 5
        stress: info: [9741] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
        stress: info: [9741] successful run completed in 5s
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 0.751 MB perf.data (15976 samples) ]
        # perf script | head
           stress  9742 32110.959106:          1 cycles:ppp:  ffffffff831b26f6 __perf_event_task_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
           stress 9742 32110.959110:       1 cycles:ppp: ffffffff8300c2e9 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
           stress 9742 32110.959112:       7 cycles:ppp: ffffffff830231e0 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
           stress 9742 32110.959115:     101 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
           stress 9742 32110.959117:    1533 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
           stress 9742 32110.959119:   23992 cycles:ppp: ffffffff831b0900 ctx_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
           stress 9742 32110.959129:  329406 cycles:ppp:     7f4b1b661930 __random_r (/usr/lib64/libc-2.25.so)
           stress 9742 32110.959249: 1288322 cycles:ppp:     5566e1e7cbc9 hogcpu (/usr/bin/stress)
           stress 9742 32110.959712: 1464046 cycles:ppp:     7f4b1b66179e __random (/usr/lib64/libc-2.25.so)
           stress 9742 32110.960241: 1266918 cycles:ppp:     7f4b1b66195b __random_r (/usr/lib64/libc-2.25.so)
        #
      Reported-by: NBram Stolk <b.stolk@gmail.com>
      Tested-by: NBram Stolk <b.stolk@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 6619a53e ("perf record: Add --initial-delay option")
      Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d3dbf43c
    • A
      perf evlist: Set the correct idx when adding dummy events · 640d5175
      Arnaldo Carvalho de Melo 提交于
      The evsel->idx field is used mainly to access the right bucket in
      per-event arrays such as the annotation ones, but also to set
      evsel->tracking, that in turn will decide what of the events will ask
      for PERF_RECORD_{MMAP,COMM,EXEC} to be generated, i.e. which
      perf_event_attr will have its mmap, etc fields set.
      
      When we were adding the "dummy" event using perf_evlist__add_dummy() we
      were not setting it correctly, which could result in multiple tracking
      events.
      
      Now that I'll try using a dummy event to be the tracking one when using
      'perf record --delay', i.e. when we process the --delay
      setting we may already have the evlist set up, like with:
      
        perf record -e cycles,instructions --delay 1000 ./workload
      
      We will need to add a "dummy" event, then reset evsel->tracking for the
      first event, "cycles", and set it instead to the dummy one, and also
      setting its attr.enable_on_exec, so that we get the PERF_RECORD_MMAP,
      etc metadata events while waiting to enable the explicitely requested
      events, so lets get this straight and set the right evsel->idx.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Bram Stolk <b.stolk@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      640d5175
  2. 09 11月, 2017 3 次提交
    • A
      perf trace: Call machine__exit() at exit · 33974a41
      Andrei Vagin 提交于
      Otherwise 'perf trace' leaves a temporary file /tmp/perf-vdso.so-XXXXXX.
      
        $ perf trace -o log true
        $ ls -l /tmp/perf-vdso.*
        -rw------- 1 root root 8192 Nov  8 03:08 /tmp/perf-vdso.so-5bCpD0
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Link: http://lkml.kernel.org/r/20171108002246.8924-1-avagin@openvz.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      33974a41
    • J
      perf tools: Fix eBPF event specification parsing · a271bfaf
      Jiri Olsa 提交于
      Looks like I've reached the new level of stupidity, adding missing braces.
      
      Committer testing:
      
      Given the following eBPF C filter, that will add a record when it
      returns true, i.e. when the tv_nsec variable is > 2000ns, should be
      built and installed via sys_bpf(), but fails to do so before this patch:
      
        # cat filter.c
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
      
        SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
        int func(void *ctx, int err, long nsec)
        {
      	  return nsec > 1000;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        #
      
        # perf trace -e nanosleep,filter.c usleep 1
        invalid or unsupported event: 'filter.c'
        Run 'perf list' for a list of valid events
      
         Usage: perf trace [<options>] [<command>]
            or: perf trace [<options>] -- <command> [<options>]
            or: perf trace record [<options>] [<command>]
            or: perf trace record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event/syscall selector. use 'perf list' to list available events
        #
      
      And works again after it is applied, the nothing is inserted when the co
      
        # perf trace -e *sleep,filter.c usleep 1
           0.000 ( 0.066 ms): usleep/23994 nanosleep(rqtp: 0x7ffead94a0d0) = 0
        # perf trace -e *sleep,filter.c usleep 2
           0.000 ( 0.008 ms): usleep/24378 nanosleep(rqtp: 0x7fffa021ba50) ...
           0.008 (         ): perf_bpf_probe:func:(ffffffffb410cb30) tv_nsec=2000)
           0.000 ( 0.066 ms): usleep/24378  ... [continued]: nanosleep()) = 0
        #
      
      The intent of 9445464b is kept:
      
        # perf stat -e 'cpu/uops_executed.core,krava/'  true
        event syntax error: '..cuted.core,krava/'
                                          \___ unknown term
      
        valid terms: cmask,pc,event,edge,in_tx,any,ldlat,inv,umask,in_tx_cp,offcore_rsp,config,config1,config2,name,period
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        #
        # perf stat -e 'cpu/uops_executed.core,period=1/'  true
      
         Performance counter stats for 'true':
      
                 808,332      cpu/uops_executed.core,period=1/
      
             0.002997237 seconds time elapsed
      
        #
      Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 9445464b ("perf tools: Unwind properly location after REJECT")
      Link: http://lkml.kernel.org/n/tip-diea0ihbwpxfw6938huv3whj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a271bfaf
    • J
      perf tools: Add "reject" option for parse-events.l · b6af53b7
      Jiri Olsa 提交于
      Arnaldo reported broken builds in some distros using a newer flex
      release, 2.6.4, found in Alpine Linux 3.6 and Edge, with flex not
      spotting the REJECT macro:
      
        CC       /tmp/build/perf/util/parse-events-flex.o
        util/parse-events.l: In function 'parse_events_lex':
        /tmp/build/perf/util/parse-events-flex.c:4734:16: error: \
        'reject_used_but_not_detected' undeclared (first use in this function)
      
      It's happening because we put the REJECT under another USER_REJECT macro
      in following commit:
      
        9445464b perf tools: Unwind properly location after REJECT
      
      Fortunately flex provides option for force it to use REJECT, adding it
      to parse-events.l.
      Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 9445464b ("perf tools: Unwind properly location after REJECT")
      Link: http://lkml.kernel.org/n/tip-7kdont984mw12ijk7rji6b8p@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b6af53b7
  3. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  4. 01 11月, 2017 6 次提交
    • N
      perf srcline: Show correct function name for srcline of callchains · 7285cf33
      Namhyung Kim 提交于
      When libbfd is not used, it doesn't show proper function name and reuse
      the original symbol of the sample.  That's because it passes the
      original sym to inline_list__append().  As `addr2line -f` returns
      function names as well, use that to create an inline_sym and pass it to
      inline_list__append().
      
      For example, following data shows that inlined entries of main have same
      name (main).
      
      Before:
        $ perf report -g srcline -q | head
            45.22%  inlining     libm-2.26.so      [.] __hypot_finite
                    |
                    ---__hypot_finite ??:0
                       |
                       |--44.15%--hypot ??:0
                       |          main complex:589
                       |          main complex:597
                       |          main complex:654
                       |          main complex:664
                       |          main inlining.cpp:14
      
      After:
        $ perf report -g srcline -q | head
            45.22%  inlining     libm-2.26.so      [.] __hypot_finite
                    |
                    ---__hypot_finite
                       |
                       |--44.15%--hypot
                       |          std::__complex_abs complex:589 (inlined)
                       |          std::abs<double> complex:597 (inlined)
                       |          std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
                       |          std::norm<double> complex:664 (inlined)
                       |          main inlining.cpp:14
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Reviewed-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20171031020654.31163-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7285cf33
    • N
      perf srcline: Fix memory leak in addr2inlines() · b7b75a60
      Namhyung Kim 提交于
      When libbfd is not used, addr2inlines() executes `addr2line -i` and
      process output line by line.  But it resets filename to NULL in the loop
      so getline() allocates additional memory everytime instead of realloc.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20171031020654.31163-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b7b75a60
    • A
      perf trace beauty kcmp: Beautify arguments · 1de3038d
      Arnaldo Carvalho de Melo 提交于
      For some unknown reason there is no entry in tracefs's syscalls for
      kcmp, i.e. no tracefs/events/syscalls/sys_{enter,exit}_kcmp, so we need
      to provide a data dictionary for the fields.
      
      To beautify the 'type' argument we automatically generate a strarray
      from tools/include/uapi/kcmp.h, the idx1 and idx2 args, nowadays used
      only if type == KCMP_FILE, are masked for all the other types and a
      lookup is made for the thread and fd to show the path, if possible,
      getting it from the probe:vfs_getname if in place or from procfs, races
      allowing.
      
      A system wide strace like tracing session, with callchains shows just
      one user so far in this fedora 25 machine:
      
        # perf trace --max-stack 5 -e kcmp
        <SNIP>
        1502914.400 ( 0.001 ms): systemd/1 kcmp(pid1: 1 (systemd), pid2: 1 (systemd), type: FILE, idx1: 271<socket:[4723475]>, idx2: 25<socket:[4788686]>) = -1 ENOSYS Function not implemented
                                               syscall (/usr/lib64/libc-2.25.so)
                                               same_fd (/usr/lib/systemd/libsystemd-shared-233.so)
                                               service_add_fd_store (/usr/lib/systemd/systemd)
                                               service_notify_message.lto_priv.127 (/usr/lib/systemd/systemd)
        1502914.407 ( 0.001 ms): systemd/1 kcmp(pid1: 1 (systemd), pid2: 1 (systemd), type: FILE, idx1: 270<socket:[4726396]>, idx2: 25<socket:[4788686]>) = -1 ENOSYS Function not implemented
                                               syscall (/usr/lib64/libc-2.25.so)
                                               same_fd (/usr/lib/systemd/libsystemd-shared-233.so)
                                               service_add_fd_store (/usr/lib/systemd/systemd)
                                               service_notify_message.lto_priv.127 (/usr/lib/systemd/systemd)
        <SNIP>
      
      The backtraces seem to agree this is really kcmp(), but this system
      doesn't have the sys_kcmp(), bummer:
      
        # uname -a
        Linux jouet 4.14.0-rc3+ #1 SMP Fri Oct 13 12:21:12 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
        # grep kcmp /proc/kallsyms
        ffffffffb60b8890 W sys_kcmp
        $ grep CONFIG_CHECKPOINT_RESTORE ../build/v4.14.0-rc3+/.config
        # CONFIG_CHECKPOINT_RESTORE is not set
        $
      
      So systemd uses it, good fedora kernel config has it:
      
        $ grep CONFIG_CHECKPOINT_RESTORE /boot/config-4.13.4-200.fc26.x86_64
        CONFIG_CHECKPOINT_RESTORE=y
        [acme@jouet linux]$
      
      /me goes to rebuild a kernel...
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-gz5fca968viw8m7hryjqvrln@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1de3038d
    • A
      perf trace beauty: Implement pid_fd beautifier · 0a2f7540
      Arnaldo Carvalho de Melo 提交于
      One that given a pid and a fd, will try to get the path for that fd.
      Will be used in the upcoming kcmp's KCMP_FILE beautifier.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-7ketygp2dvs9h13wuakfncws@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0a2f7540
    • A
      tools include uapi: Grab a copy of linux/kcmp.h · 735e215e
      Arnaldo Carvalho de Melo 提交于
      We will use it to generate tables for beautifying kcmp's 'type' arg.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-r35zr79invmpinfe1zu57cas@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      735e215e
    • N
      perf callchain: Fix double mapping al->addr for children without self period · d6332a17
      Namhyung Kim 提交于
      Milian Wolff found a problem he described in [1] and that for him would
      get fixed:
      
      "Note how most of the large offset values are now gone. Most notably, we
      get proper srcline resolution for the random.h and complex headers."
      
      Then Namhyung found the root cause:
      
      "I looked into it and found a bug handling cumulative (children)
      entries.  For children entries that have no self period, the al->addr (so
      he->ip) ends up having an doubly-mapped address.
      
      It seems to be there from the beginning but only affects entries that
      have no srclines - finding srcline itself is done using a different
      address but it will show the invalid address if no srcline was found.  I
      think we should fix the commit c7405d85 ("perf tools: Update cpumode
      for each cumulative entry")."
      
      [1] https://lkml.kernel.org/r/20171018185350.14893-7-milian.wolff@kdab.comReported-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: kernel-team@lge.com
      Fixes: c7405d85 ("perf tools: Update cpumode for each cumulative entry")
      Link: https://lkml.kernel.org/r/20171020051533.GA2746@sejongSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d6332a17
  5. 31 10月, 2017 6 次提交
  6. 27 10月, 2017 8 次提交
  7. 26 10月, 2017 1 次提交
    • R
      perf symbols: Fix memory corruption because of zero length symbols · 331c7cb3
      Ravi Bangoria 提交于
      Perf top is often crashing at very random locations on powerpc.  After
      investigating, I found the crash only happens when sample is of zero
      length symbol. Powerpc kernel has many such symbols which does not
      contain length details in vmlinux binary and thus start and end
      addresses of such symbols are same.
      
      Structure
      
        struct sym_hist {
              u64                   nr_samples;
              u64                   period;
              struct sym_hist_entry addr[0];
        };
      
      has last member 'addr[]' of size zero. 'addr[]' is an array of addresses
      that belongs to one symbol (function). If function consist of 100
      instructions, 'addr' points to an array of 100 'struct sym_hist_entry'
      elements. For zero length symbol, it points to the *empty* array, i.e.
      no members in the array and thus offset 0 is also invalid for such
      array.
      
        static int __symbol__inc_addr_samples(...)
        {
              ...
              offset = addr - sym->start;
              h = annotation__histogram(notes, evidx);
              h->nr_samples++;
              h->addr[offset].nr_samples++;
              h->period += sample->period;
              h->addr[offset].period += sample->period;
              ...
        }
      
      Here, when 'addr' is same as 'sym->start', 'offset' becomes 0, which is
      valid for normal symbols but *invalid* for zero length symbols and thus
      updating h->addr[offset] causes memory corruption.
      
      Fix this by adding one dummy element for zero length symbols.
      
      Link: https://lkml.org/lkml/2016/10/10/148
      Fixes: edee44be ("perf annotate: Don't throw error for zero length symbols")
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/1508854806-10542-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      331c7cb3
  8. 25 10月, 2017 5 次提交
    • M
      perf util: Enable handling of inlined frames by default · d8a88dd2
      Milian Wolff 提交于
      Now that we have caches in place to speed up the process of finding
      inlined frames and srcline information repeatedly, we can enable this
      useful option by default.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8a88dd2
    • M
      perf report: Use srcline from callchain for hist entries · 1fb7d06a
      Milian Wolff 提交于
      This also removes the symbol name from the srcline column, more on this
      below.
      
      This ensures we use the correct srcline, which could originate from a
      potentially inlined function. The hist entries used to query for the
      srcline based purely on the IP, which leads to wrong results for inlined
      entries.
      
      Before:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ..................................................................................................................................
        #
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            44.58%     0.00%  main+100
            44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  std::__complex_abs+100
            44.58%     0.00%  std::abs<double>+100
            44.58%     0.00%  std::norm<double>+100
            36.01%     0.00%  hypot+18446603487892193300
            25.81%     0.00%  main+41
            25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.75%    25.75%  random.h:143
            18.39%     0.00%  main+57
            18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  random.tcc:3330
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ...........................................
        #
            94.30%     1.19%  main.cpp:39
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            48.44%     1.70%  random.h:1823
            48.44%     0.00%  random.h:1814
            46.74%     2.53%  random.h:185
            44.68%     0.10%  complex:589
            44.68%     0.00%  complex:597
            44.68%     0.00%  complex:654
            44.68%     0.00%  complex:664
            40.61%    13.80%  random.tcc:3330
            36.01%     0.00%  hypot+18446603487892193300
            26.81%     0.00%  random.h:151
            26.81%     0.00%  random.h:332
            25.75%    25.75%  random.h:143
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Note that this change removes the symbol from the source:line hist
      column. If this information is desired, users should explicitly query
      for it if needed. I.e. run this command instead:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ...........................................
        #
            94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
            46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
            44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
            44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
            44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
            44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
            39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
            26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
            25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
            25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Compared to the old behavior, this reduces duplication in the output.
      Before we used to print the symbol name in the srcline column even
      when the sym column was explicitly requested. I.e. the output was:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ..................................................................................................................................
        #
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            44.58%     0.00%  [.] main                                                                                                                             main+100
            44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
            44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
            44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            25.81%     0.00%  [.] main                                                                                                                             main+41
            25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
            18.39%     0.00%  [.] main                                                                                                                             main+57
            18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fb7d06a
    • M
      perf report: Cache srclines for callchain nodes · 21ac9d54
      Milian Wolff 提交于
      On one hand this ensures that the memory is properly freed when the DSO
      gets freed. On the other hand this significantly speeds up the
      processing of the callchain nodes when lots of srclines are requested.
      For one of my data files e.g.:
      
      Before:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            52496.495043      task-clock (msec)         #    0.999 CPUs utilized
                     634      context-switches          #    0.012 K/sec
                       2      cpu-migrations            #    0.000 K/sec
                 191,561      page-faults               #    0.004 M/sec
         165,074,498,235      cycles                    #    3.144 GHz
         334,170,832,408      instructions              #    2.02  insn per cycle
          90,220,029,745      branches                  # 1718.591 M/sec
             654,525,177      branch-misses             #    0.73% of all branches
      
            52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
      
      After:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                      31      context-switches          #    0.001 K/sec
                       0      cpu-migrations            #    0.000 K/sec
                 185,471      page-faults               #    0.008 M/sec
          71,188,113,681      cycles                    #    3.149 GHz
         133,204,943,083      instructions              #    1.87  insn per cycle
          34,886,384,979      branches                  # 1543.214 M/sec
             278,214,495      branch-misses             #    0.80% of all branches
      
            22.609857253 seconds time elapsed
      
      Note that the difference is only this large when `--inline` is not
      passed. In such situations, we would use the inliner cache and thus do
      not run this code path that often.
      
      I think that this cache should actually be used in other places, too.
      When looking at the valgrind leak report for perf report, we see tons of
      srclines being leaked, most notably from calls to
      hist_entry__get_srcline. The problem is that get_srcline has many
      different formatting options (show_sym, show_addr, potentially even
      unwind_inlines when calling __get_srcline directly). As such, the
      srcline cannot easily be cached for all calls, or we'd have to add
      caches for all formatting combinations (6 so far). An alternative would
      be to remove the formatting options and handle that on a different level
      - i.e. print the sym/addr on demand wherever we actually output
      something. And the unwind_inlines could be moved into a separate
      function that does not return the srcline.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      21ac9d54
    • M
      perf report: Cache failed lookups of inlined frames · b38775cf
      Milian Wolff 提交于
      When no inlined frames could be found for a given address, we did not
      store this information anywhere. That means we potentially do the costly
      inliner lookup repeatedly for cases where we know it can never succeed.
      
      This patch makes dso__parse_addr_inlines always return a valid
      inline_node. It will be empty when no inliners are found. This enables
      us to cache the empty list in the DSO, thereby improving the performance
      when many addresses fail to find the inliners.
      
      For my trivial example, the performance impact is already quite
      significant:
      
      Before:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                      53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                       0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
                   5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
           2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
           4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
             939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
              11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )
      
             0.596246531 seconds time elapsed                                          ( +-  0.07% )
      ~~~~~
      
      After:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                      29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                       0      cpu-migrations            #    0.000 K/sec
                   5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
             432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
             670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
             141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
               2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )
      
             0.114222393 seconds time elapsed                                          ( +-  1.19% )
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b38775cf
    • M
      perf report: Properly handle branch count in match_chain() · bf36eb5c
      Milian Wolff 提交于
      Some of the code paths I introduced before returned too early without
      running the code to handle a node's branch count.  By refactoring
      match_chain to only have one exit point, this can be remedied.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
      Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bf36eb5c
  9. 24 10月, 2017 7 次提交
    • M
      perf report: Compare symbol name for inlined frames when sorting · aa441895
      Milian Wolff 提交于
      Similar to the callstack frame matching, we also have to compare the
      symbol name when sorting hist entries. The reason is twofold: On one
      hand, multiple inlined functions will use the same symbol start/end
      values of the parent, non-inlined symbol.
      
      As such, all of these symbols often end up missing from top-level
      report, as they get merged with the non-inlined frame. On the other
      hand, multiple different functions may end up inlining the same
      function, and we need to aggregate these values properly.
      
      Before:
      
      ~~~~~
        perf report --stdio --inline -g none
        # Children     Self  Command       Shared Object Symbol
        # ........ ........  ............  ............. ...................................
        #
           100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
           100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
           100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
            97.03%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
            59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
            55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
             0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --stdio --inline -g none
        # Children     Self  Command       Shared Object Symbol
        # ........ ........  ............  ............. ...................................................................................................................................
        #
           100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
           100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
           100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::__complex_abs (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::abs<double> (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
            59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
            55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
            34.46%    0.00%  cpp-inlining  cpp-inlining  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
            32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
            32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
             0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa441895
    • M
      perf callchain: Compare symbol name for inlined frames when matching · 9856240a
      Milian Wolff 提交于
      The fake symbols we create for inlined frames will represent different
      functions but can use the symbol start address. This leads to issues
      when different inline branches all lead to the same function.
      
      Before:
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                              --37.57%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        |
                                         --36.36%--std::abs<double> (inlined)
                                                   std::__complex_abs (inlined)
                                                   |
                                                    --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                                                              std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                                                              std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
      ~~~~~
      
      Note that this backtrace representation is completely bogus.
      Complex abs does not call the linear congruential engine! It
      is just a side-effect of a longer inlined stack being appended
      to a shorter, different inlined stack, both of which originate
      in the same function (main).
      
      This patch fixes the issue:
      
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                             |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          |
                             |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
                             |                     std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |                     |
                             |                      --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                             |                                std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                             |                                std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
                             |
                              --1.99%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        std::abs<double> (inlined)
                                        std::__complex_abs (inlined)
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      [ Fix up conflict with c1fbc0cf ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9856240a
    • M
      perf script: Mark inlined frames and do not print DSO for them · 9628b56d
      Milian Wolff 提交于
      Instead of showing the (repeated) DSO name of the non-inlined frame, we
      now show the "(inlined)" suffix instead.
      
      Before:
                         214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                          ace3 hypot (/usr/lib/libm-2.25.so)
                           a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a main (/home/milian/projects/src/perf-tests/inlining)
                         20510 __libc_start_main (/usr/lib/libc-2.25.so)
                           bd9 _start (/home/milian/projects/src/perf-tests/inlining)
      
      After:
                         214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                          ace3 hypot (/usr/lib/libm-2.25.so)
                           a4a std::__complex_abs (inlined)
                           a4a std::abs<double> (inlined)
                           a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                           a4a std::norm<double> (inlined)
                           a4a main (/home/milian/projects/src/perf-tests/inlining)
                         20510 __libc_start_main (/usr/lib/libc-2.25.so)
                           bd9 _start (/home/milian/projects/src/perf-tests/inlining)
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9628b56d
    • M
      perf callchain: Mark inlined frames in output by " (inlined)" suffix · 8932f807
      Milian Wolff 提交于
      The original patch that introduced inline frame output in the various
      browsers used this suffix already. The new centralized approach that
      uses fake symbols for inlined frames was missing this approach so far.
      
      Instead of changing the symbol name itself, we only print the suffix
      where needed. This allows us to efficiently lookup the symbol for a
      given name without first having to append the suffix before the lookup.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8932f807
    • M
      perf report: Fall-back to function name comparison for -g srcline · cbe50f61
      Milian Wolff 提交于
      When a callchain entry has no srcline available, we ended up comparing
      the instruction pointer. I consider this to be not too useful. Rather, I
      think we should group the entries by function name, which this patch
      adds. For people who want to split the data on the IP boundary, using
      `-g address` is the correct choice.
      
      Before:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--56.03%--hypot
                  |          |          |
                  |          |          |--8.45%--__hypot_finite
                  |          |          |
                  |          |          |--7.62%--__hypot_finite
                  |          |          |
                  |          |          |--2.29%--__hypot_finite
                  |          |          |
                  |          |          |--2.24%--__hypot_finite
                  |          |          |
                  |          |          |--2.06%--__hypot_finite
                  |          |          |
                  |          |          |--1.81%--__hypot_finite
      ...
      ~~~~~
      
      After:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--60.29%--hypot
                  |          |          |
                  |          |           --56.03%--__hypot_finite
                  |          |
                  |           --0.85%--cabs
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cbe50f61
    • M
      perf callchain: Create real callchain entries for inlined frames · 11ea2515
      Milian Wolff 提交于
      The inline_node structs are maintained by the new dso->inlines tree.
      This in turn keeps ownership of the fake symbols and srcline string
      representing an inline frame.
      
      This tree is sorted by address to allow quick lookups. All other entries
      of the symbol beside the function name are unused for inline frames. The
      advantage of this approach is that all existing users of the callchain
      API can now transparently display inlined frames without having to patch
      their code.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      11ea2515
    • M
      perf callchain: Refactor inline_list to store srcline string directly · 2be8832f
      Milian Wolff 提交于
      This is a preparation for the creation of real callchain entries for
      inlined frames. The rest of the perf code uses the srcline string. As
      such, using that also for the srcline API allows us to simplify some of
      the upcoming code. Most notably, it will allow us to cache the srcline
      for a given inline node and reuse it for different callchain entries.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2be8832f