1. 16 4月, 2020 2 次提交
    • J
      perf machine: Set ksymbol dso as loaded on arrival · 7eddf7e7
      Jiri Olsa 提交于
      There's no special load action for ksymbol data on map__load/dso__load
      action, where the kernel is getting loaded. It only gets confused with
      kernel kallsyms/vmlinux load for bpf object, which fails and could mess
      up with the map.
      
      Disabling any further load of the map for ksymbol related dso/map.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-15-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7eddf7e7
    • J
      perf tools: Synthesize bpf_trampoline/dispatcher ksymbol event · 943930e4
      Jiri Olsa 提交于
      Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol
      events from /proc/kallsyms. Having this perf can recognize samples from
      those images and perf report and top shows them correctly.
      
      The rest of the ksymbol handling is already in place from for the bpf
      programs monitoring, so only the initial state was needed.
      
      perf report output:
      
        # Overhead  Command     Shared Object                  Symbol
      
          12.37%  test_progs  [kernel.vmlinux]                 [k] entry_SYSCALL_64
          11.80%  test_progs  [kernel.vmlinux]                 [k] syscall_return_via_sysret
           9.63%  test_progs  bpf_prog_bcf7977d3b93787c_prog2  [k] bpf_prog_bcf7977d3b93787c_prog2
           6.90%  test_progs  bpf_trampoline_24456             [k] bpf_trampoline_24456
           6.36%  test_progs  [kernel.vmlinux]                 [k] memcpy_erms
      
      Committer notes:
      
      Use scnprintf() instead of strncpy() to overcome this on fedora:32,
      rawhide and OpenMandriva Cooker:
      
          CC       /tmp/build/perf/util/bpf-event.o
        In file included from /usr/include/string.h:495,
                         from /git/linux/tools/lib/bpf/libbpf_common.h:12,
                         from /git/linux/tools/lib/bpf/bpf.h:31,
                         from util/bpf-event.c:4:
        In function 'strncpy',
            inlined from 'process_bpf_image' at util/bpf-event.c:323:2,
            inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9:
        /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation]
          106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-14-jolsa@kernel.org/Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      943930e4
  2. 14 4月, 2020 2 次提交
    • J
      perf stat: Fix no metric header if --per-socket and --metric-only set · 8358f698
      Jin Yao 提交于
      We received a report that was no metric header displayed if --per-socket
      and --metric-only were both set.
      
      It's hard for script to parse the perf-stat output. This patch fixes this
      issue.
      
      Before:
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
        ^C
         Performance counter stats for 'system wide':
      
        S0        8                  2.6
      
               2.215270071 seconds time elapsed
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
        #           time socket cpus
             1.000411692 S0        8                  2.2
             2.001547952 S0        8                  3.4
             3.002446511 S0        8                  3.4
             4.003346157 S0        8                  4.0
             5.004245736 S0        8                  0.3
      
      After:
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
        ^C
         Performance counter stats for 'system wide':
      
                                     CPI
        S0        8                  2.1
      
               1.813579830 seconds time elapsed
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
        #           time socket cpus                  CPI
             1.000415122 S0        8                  3.2
             2.001630051 S0        8                  2.9
             3.002612278 S0        8                  4.3
             4.003523594 S0        8                  3.0
             5.004504256 S0        8                  3.7
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8358f698
    • A
      perf python: Check if clang supports -fno-semantic-interposition · 9a00df31
      Arnaldo Carvalho de Melo 提交于
      The set of C compiler options used by distros to build python bindings
      may include options that are unknown to clang, we check for a variety of
      such options, add -fno-semantic-interposition to that mix:
      
      This fixes the build on, among others, Manjaro Linux:
      
          GEN      /tmp/build/perf/python/perf.so
        clang-9: error: unknown argument: '-fno-semantic-interposition'
        error: command 'clang' failed with exit status 1
        make: Leaving directory '/git/perf/tools/perf'
      
        [perfbuilder@602aed1c266d ~]$ gcc -v
        Using built-in specs.
        COLLECT_GCC=gcc
        COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper
        Target: x86_64-pc-linux-gnu
        Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pkgversion='Arch Linux 9.3.0-1' --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc
        Thread model: posix
        gcc version 9.3.0 (Arch Linux 9.3.0-1)
        [perfbuilder@602aed1c266d ~]$
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a00df31
  3. 03 4月, 2020 10 次提交
    • A
      perf python: Fix clang detection to strip out options passed in $CC · 9ff76cea
      Arnaldo Carvalho de Melo 提交于
      The clang check in the python setup.py file expected $CC to be just the
      name of the compiler, not the compiler + options, i.e. all options were
      expected to be passed in $CFLAGS, this ends up making it fail in systems
      where CC is set to, e.g.:
      
       "aarch64-linaro-linux-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot"
      
      Like this:
      
        $ python3
        >>> from subprocess import Popen
        >>> a = Popen(["aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot", "-v"])
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
            restore_signals, start_new_session)
          File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
            raise child_exception_type(errno_num, err_msg, err_filename)
        FileNotFoundError: [Errno 2] No such file or directory: 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot': 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot'
        >>>
      
      Make it more robust, covering this case, by passing cc.split()[0] as the
      first arg to popen().
      
      Fixes: a7ffd416 ("perf python: Fix clang detection when using CC=clang-version")
      Reported-by: NDaniel Díaz <daniel.diaz@linaro.org>
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: NDaniel Díaz <daniel.diaz@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ilie Halip <ilie.halip@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20200401124037.GA12534@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ff76cea
    • A
      perf script report: Fix SEGFAULT when using DWARF mode · 1a4025f0
      Andreas Gerstmayr 提交于
      When running perf script report with a Python script and a callgraph in
      DWARF mode, intr_regs->regs can be 0 and therefore crashing the regs_map
      function.
      
      Added a check for this condition (same check as in builtin-script.c:595).
      Signed-off-by: NAndreas Gerstmayr <agerstmayr@redhat.com>
      Tested-by: NKim Phillips <kim.phillips@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Link: http://lore.kernel.org/lkml/20200402125417.422232-1-agerstmayr@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1a4025f0
    • A
      perf events parser: Add missing Intel CPU events to parser · 47327f56
      Adrian Hunter 提交于
      perf list expects CPU events to be parseable by name, e.g.
      
          # perf list | grep el-capacity-read
            el-capacity-read OR cpu/el-capacity-read/          [Kernel PMU event]
      
      But the event parser does not recognize them that way, e.g.
      
          # perf test -v "Parse event"
          <SNIP>
          running test 54 'cycles//u'
          running test 55 'cycles:k'
          running test 0 'cpu/config=10,config1,config2=3,period=1000/u'
          running test 1 'cpu/config=1,name=krava/u,cpu/config=2/u'
          running test 2 'cpu/config=1,call-graph=fp,time,period=100000/,cpu/config=2,call-graph=no,time=0,period=2000/'
          running test 3 'cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp'
          -> cpu/event=0,umask=0x11/
          -> cpu/event=0,umask=0x13/
          -> cpu/event=0x54,umask=0x1/
          failed to parse event 'el-capacity-read:u,cpu/event=el-capacity-read/u', err 1, str 'parser error'
          event syntax error: 'el-capacity-read:u,cpu/event=el-capacity-read/u'
                                 \___ parser error test child finished with 1
          ---- end ----
          Parse event definition strings: FAILED!
      
      This happens because the parser splits names by '-' in order to deal
      with cache events. For example 'L1-dcache' is a token in
      parse-events.l which is matched to 'L1-dcache-load-miss' by the
      following rule:
      
          PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_event_config
      
      And so there is special handling for 2-part PMU names i.e.
      
          PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc
      
      but no handling for 3-part names, which are instead added as tokens e.g.
      
          topdown-[a-z-]+
      
      While it would be possible to add a rule for 3-part names, that would
      not work if the first parts were also a valid PMU name e.g.
      'el-capacity-read' would be matched to 'el-capacity' before the parser
      reached the 3rd part.
      
      The parser would need significant change to rationalize all this, so
      instead fix for now by adding missing Intel CPU events with 3-part names
      to the event parser as tokens.
      
      Missing events were found by using:
      
          grep -r EVENT_ATTR_STR arch/x86/events/intel/core.c
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Link: http://lore.kernel.org/lkml/90c7ae07-c568-b6d3-f9c4-d0c1528a0610@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      47327f56
    • S
      perf script: Allow --symbol to accept hexadecimal addresses · d2bedb78
      Stephane Eranian 提交于
      This patch extends the perf script --symbols option to filter on
      hexadecimal addresses in addition to symbol names. This makes it easier
      to handle cases where symbols are aliased.
      
      With this patch, it is possible to mix and match symbols and hexadecimal
      addresses using the --symbols option.
      
        $ perf script --symbols=noploop,0x4007a0
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325220802.15039-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2bedb78
    • N
      perf record: Add --all-cgroups option · 8fb4b679
      Namhyung Kim 提交于
      The --all-cgroups option is to enable cgroup profiling support.  It
      tells kernel to record CGROUP events in the ring buffer so that perf
      report can identify task/cgroup association later.
      
        [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.042 MB perf.data (558 samples) ]
        [root@seventh ~]# perf report --stdio -s cgroup_id,cgroup,pid
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 558  of event 'cycles'
        # Event count (approx.): 458017341
        #
        # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
        # ........  .....................  ..........  ...............
        #
            33.15%  4/0xeffffffb           /sub           9615:looper0
            32.83%  4/0xf00002f5           /sub/cgrp2     9620:looper2
            32.79%  4/0xf00002f4           /sub/cgrp1     9619:looper1
             0.35%  4/0xf00002f5           /sub/cgrp2     9618:cgtest
             0.34%  4/0xf00002f4           /sub/cgrp1     9617:cgtest
             0.32%  4/0xeffffffb           /              9615:looper0
             0.11%  4/0xeffffffb           /sub           9617:cgtest
             0.10%  4/0xeffffffb           /sub           9618:cgtest
      
        #
        # (Tip: Sample related events with: perf record -e '{cycles,instructions}:S')
        #
        [root@seventh ~]#
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-8-namhyung@kernel.org
      Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
      [ Extracted the HAVE_FILE_HANDLE from the followup patch ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8fb4b679
    • N
      perf record: Support synthesizing cgroup events · ab64069f
      Namhyung Kim 提交于
      Synthesize cgroup events by iterating cgroup filesystem directories.
      The cgroup event only saves the portion of cgroup path after the mount
      point and the cgroup id (which actually is a file handle).
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-7-namhyung@kernel.org
      Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
      [ Extracted the HAVE_FILE_HANDLE from the followup patch, added missing __maybe_unused ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab64069f
    • N
      perf report: Add 'cgroup' sort key · b629f3e9
      Namhyung Kim 提交于
      The cgroup sort key is to show cgroup membership of each task.
      Currently it shows full path in the cgroupfs (not relative to the root
      of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise root
      cgroup in different namespaces will all show same name - "/".
      
      The cgroup sort key should come before cgroup_id otherwise
      sort_dimension__add() will match it to cgroup_id as it only matches with
      the given substring.
      
      For example it will look like following.  Note that record patch adding
      --all-cgroups patch will come later.
      
        $ perf record -a --namespace --all-cgroups  cgtest
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]
      
        $ perf report -s cgroup_id,cgroup,pid
        ...
        # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
        # ........  .....................  ..........  ...............
        #
            93.96%  0/0x0                  /                 0:swapper
             1.25%  3/0xeffffffb           /               278:looper0
             0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
             0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
             0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
             0.22%  3/0xeffffffb           /sub            278:looper0
             0.20%  3/0xeffffffb           /               280:cgtest
             0.15%  3/0xf0000163           /sub/cgrp3      285:looper3
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b629f3e9
    • N
      perf cgroup: Maintain cgroup hierarchy · d1277aa3
      Namhyung Kim 提交于
      Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
      id.  Hist entries have cgroup id can compare it directly and later it
      can be used to find a group name using this tree.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-5-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d1277aa3
    • N
      perf tools: Basic support for CGROUP event · ba78c1c5
      Namhyung Kim 提交于
      Implement basic functionality to support cgroup tracking.  Each cgroup
      can be identified by inode number which can be read from userspace too.
      The actual cgroup processing will come in the later patch.
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      [ fix perf test failure on sampling parsing ]
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba78c1c5
    • A
      perf python: Include rwsem.c in the pythong biding · 460c3ed9
      Arnaldo Carvalho de Melo 提交于
      We'll need it for the cgroup patches, and its better to have it in a
      separate patch in case we need to later revert the cgroup patches.
      
      I.e. without this we have:
      
        [root@five ~]# perf test -v python
        19: 'import perf' in python                               :
        --- start ---
        test child forked, pid 148447
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write
        test child finished with -1
        ---- end ----
        'import perf' in python: FAILED!
        [root@five ~]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200403123606.GC23243@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      460c3ed9
  4. 26 3月, 2020 1 次提交
  5. 24 3月, 2020 10 次提交
    • R
      perf dso: Fix dso comparison · 0d33b343
      Ravi Bangoria 提交于
      Perf gets dso details from two different sources. 1st, from builid
      headers in perf.data and 2nd from MMAP2 samples. Dso from buildid
      header does not have dso_id detail. And dso from MMAP2 samples does
      not have buildid information. If detail of the same dso is present
      at both the places, filename is common.
      
      Previously, __dsos__findnew_link_by_longname_id() used to compare only
      long or short names, but Commit 0e3149f8 ("perf dso: Move dso_id
      from 'struct map' to 'struct dso'") also added a dso_id comparison.
      Because of that, now perf is creating two different dso objects of the
      same file, one from buildid header (with dso_id but without buildid)
      and second from MMAP2 sample (with buildid but without dso_id).
      
      This is causing issues with archive, buildid-list etc subcommands. Fix
      this by comparing dso_id only when it's present. And incase dso is
      present in 'dsos' list without dso_id, inject dso_id detail as well.
      
      Before:
      
        $ sudo ./perf buildid-list -H
        0000000000000000000000000000000000000000 /usr/bin/ls
        0000000000000000000000000000000000000000 /usr/lib64/ld-2.30.so
        0000000000000000000000000000000000000000 /usr/lib64/libc-2.30.so
      
        $ ./perf archive
        perf archive: no build-ids found
      
      After:
      
        $ ./perf buildid-list -H
        b6b1291d0cead046ed0fa5734037fa87a579adee /usr/bin/ls
        641f0c90cfa15779352f12c0ec3c7a2b2b6f41e8 /usr/lib64/ld-2.30.so
        675ace3ca07a0b863df01f461a7b0984c65c8b37 /usr/lib64/libc-2.30.so
      
        $ ./perf archive
        Now please run:
      
        $ tar xvf perf.data.tar.bz2 -C ~/.debug
      
        wherever you need to run 'perf report' on.
      
      Committer notes:
      
      Renamed is_empty_dso_id() to dso_id__empty() and inject_dso_id() to
      dso__inject_id() to keep namespacing consistent.
      
      Fixes: 0e3149f8 ("perf dso: Move dso_id from 'struct map' to 'struct dso'")
      Reported-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20200324042424.68366-1-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d33b343
    • C
      perf cpumap: Fix snprintf overflow check · d74b181a
      Christophe JAILLET 提交于
      'snprintf' returns the number of characters which would be generated for
      the given input.
      
      If the returned value is *greater than* or equal to the buffer size, it
      means that the output has been truncated.
      
      Fix the overflow test accordingly.
      
      Fixes: 7780c25b ("perf tools: Allow ability to map cpus to nodes easily")
      Fixes: 92a7e127 ("perf cpumap: Add cpu__max_present_cpu()")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Suggested-by: NDavid Laight <David.Laight@ACULAB.COM>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: He Zhe <zhe.he@windriver.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-janitors@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200324070319.10901-1-christophe.jaillet@wanadoo.frSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d74b181a
    • J
      perf pmu: Make pmu_uncore_alias_match() public · 5b9a5000
      John Garry 提交于
      The perf pmu-events test will want to use pmu_uncore_alias_match(), so
      make it public.
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-7-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5b9a5000
    • J
      perf pmu: Add is_pmu_core() · d504fae9
      John Garry 提交于
      Add a function to decide whether a PMU is a core PMU.
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-6-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d504fae9
    • J
      perf pmu: Refactor pmu_add_cpu_aliases() · e45ad701
      John Garry 提交于
      Create pmu_add_cpu_aliases_map() from pmu_add_cpu_aliases(), so the caller
      can pass the map; the pmu-events test would use this since there would
      be no CPUID matching to a mapfile there.
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-4-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e45ad701
    • K
      perf metricgroup: Fix printing event names of metric group with multiple... · 58fc90fd
      Kajol Jain 提交于
      perf metricgroup: Fix printing event names of metric group with multiple events incase of overlapping events
      
      Commit f01642e4 ("perf metricgroup: Support multiple events for
      metricgroup") introduced support for multiple events in a metric group.
      But with the current upstream, metric events names are not printed
      properly incase we try to run multiple metric groups with overlapping
      event.
      
      With current upstream version, incase of overlapping metric events issue
      is, we always start our comparision logic from start.  So, the events
      which already matched with some metric group also take part in
      comparision logic. Because of that when we have overlapping events, we
      end up matching current metric group event with already matched one.
      
      For example, in skylake machine we have metric event CoreIPC and
      Instructions. Both of them need 'inst_retired.any' event value.  As
      events in Instructions is subset of events in CoreIPC, they endup in
      pointing to same 'inst_retired.any' value.
      
      In skylake platform:
      
      command:# ./perf stat -M CoreIPC,Instructions  -C 0 sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
           1,254,992,790      inst_retired.any          # 1254992790.0
                                                          Instructions
                                                        #      1.3 CoreIPC
             977,172,805      cycles
           1,254,992,756      inst_retired.any
      
             1.000802596 seconds time elapsed
      
      command:# sudo ./perf stat -M UPI,IPC sleep 1
      
         Performance counter stats for 'sleep 1':
                 948,650      uops_retired.retire_slots
                 866,182      inst_retired.any          #      0.7 IPC
                 866,182      inst_retired.any
               1,175,671      cpu_clk_unhalted.thread
      
      Patch fixes the issue by adding a new bool pointer 'evlist_used' to keep
      track of events which already matched with some group by setting it
      true.  So, we skip all used events in list when we start comparision
      logic.  Patch also make some changes in comparision logic, incase we get
      a match miss, we discard the whole match and start again with first
      event id in metric event.
      
      With this patch:
      
      In skylake platform:
      
      command:# ./perf stat -M CoreIPC,Instructions  -C 0 sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
               3,348,415      inst_retired.any          #      0.3 CoreIPC
              11,779,026      cycles
               3,348,381      inst_retired.any          # 3348381.0
                                                          Instructions
      
             1.001649056 seconds time elapsed
      
      command:# ./perf stat -M UPI,IPC sleep 1
      
       Performance counter stats for 'sleep 1':
      
               1,023,148      uops_retired.retire_slots #      1.1 UPI
                 924,976      inst_retired.any
                 924,976      inst_retired.any          #      0.6 IPC
               1,489,414      cpu_clk_unhalted.thread
      
             1.003064672 seconds time elapsed
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200221101121.28920-1-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      58fc90fd
    • J
      perf stat: Align the output for interval aggregation mode · d13e9e41
      Jin Yao 提交于
      There is a slight misalignment in -A -I output.
      
      For example:
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000440863 CPU0               1,068,388      cpu/event=cpu-cycles/
            1.000440863 CPU1                 875,954      cpu/event=cpu-cycles/
            1.000440863 CPU2               3,072,538      cpu/event=cpu-cycles/
            1.000440863 CPU3               4,026,870      cpu/event=cpu-cycles/
            1.000440863 CPU4               5,919,630      cpu/event=cpu-cycles/
            1.000440863 CPU5               2,714,260      cpu/event=cpu-cycles/
            1.000440863 CPU6               2,219,240      cpu/event=cpu-cycles/
            1.000440863 CPU7               1,299,232      cpu/event=cpu-cycles/
      
      The value of counts is not aligned with the column "counts" and
      the event name is not aligned with the column "events".
      
      With this patch, the output is,
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000423009 CPU0                  997,421      cpu/event=cpu-cycles/
            1.000423009 CPU1                1,422,042      cpu/event=cpu-cycles/
            1.000423009 CPU2                  484,651      cpu/event=cpu-cycles/
            1.000423009 CPU3                  525,791      cpu/event=cpu-cycles/
            1.000423009 CPU4                1,370,100      cpu/event=cpu-cycles/
            1.000423009 CPU5                  442,072      cpu/event=cpu-cycles/
            1.000423009 CPU6                  205,643      cpu/event=cpu-cycles/
            1.000423009 CPU7                1,302,250      cpu/event=cpu-cycles/
      
      Now output is aligned.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d13e9e41
    • J
      perf report: Support a new key to reload the browser · 5e3b810a
      Jin Yao 提交于
      Sometimes we may need to reload the browser to update the output since
      some options are changed.
      
      This patch creates a new key K_RELOAD. Once the __cmd_report() returns
      K_RELOAD, it would repeat the whole process, such as, read samples from
      data file, sort the data and display in the browser.
      
       v5:
       ---
       1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
       2. Skip setup_sorting() in repeat path if last key is K_RELOAD.
      
       v4:
       ---
       Need to quit in perf_evsel_menu__run if key is K_RELOAD.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5e3b810a
    • J
      perf report: Allow specifying event to be used as sort key in --group output · 429a5f9d
      Jin Yao 提交于
      When performing "perf report --group", it shows the event group
      information together. By default, the output is sorted by the first
      event in group.
      
      It would be nice for user to select any event for sorting. This patch
      introduces a new option "--group-sort-idx" to sort the output by the
      event at the index n in event group.
      
      For example,
      
      Before:
      
        # perf report --group --stdio
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             1.56%   0.01%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494ce
             1.56%   0.00%   0.00%   0.00%  mgen       [kernel.kallsyms]        [k] task_tick_fair
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             0.00%   0.03%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] g_main_context_check
             0.00%   0.03%   0.00%   0.00%  swapper    [kernel.kallsyms]        [k] apic_timer_interrupt
             ...
      
      After:
      
        # perf report --group --stdio --group-sort-idx 3
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             0.00%   0.00%   0.00%   0.06%  swapper    [kernel.kallsyms]        [k] hrtimer_start_range_ns
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] update_curr
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] apic_timer_interrupt
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] native_apic_msr_eoi_write
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] __update_load_avg_se
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] scheduler_tick
      
      Now the output is sorted by the fourth event in group.
      
       v7:
       ---
       Rebase to latest perf/core, no other change.
      
       v4:
       ---
       1. Update Documentation/perf-report.txt to mention
          '--group-sort-idx' support multiple groups with different
          amount of events and it should be used on grouped events.
      
       2. Update __hpp__group_sort_idx(), just return when the
          idx is out of limit.
      
       3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups.
          So now we don't need to use together with --group.
      
       v3:
       ---
       Refine the code in __hpp__group_sort_idx().
      
       Before:
         for (i = 1; i < nr_members; i++) {
              if (i == idx) {
                      ret = field_cmp(fields_a[i], fields_b[i]);
                      if (ret)
                              goto out;
              }
         }
      
       After:
         if (idx >= 1 && idx < nr_members) {
              ret = field_cmp(fields_a[idx], fields_b[idx]);
              if (ret)
                      goto out;
         }
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com
      [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      429a5f9d
    • J
      perf report: Support interactive annotation of code without symbols · 7b0a0dcb
      Jin Yao 提交于
      For perf report on stripped binaries it is currently impossible to do
      annotation. The annotation state is all tied to symbols, but there are
      either no symbols, or symbols are not covering all the code.
      
      We should support the annotation functionality even without symbols.
      
      This patch fakes a symbol and the symbol name is the string of address.
      After that, we just follow current annotation working flow.
      
      For example,
      
      1. perf report
      
        Overhead  Command  Shared Object     Symbol
          20.67%  div      libc-2.27.so      [.] __random_r
          17.29%  div      libc-2.27.so      [.] __random
          10.59%  div      div               [.] 0x0000000000000628
           9.25%  div      div               [.] 0x0000000000000612
           6.11%  div      div               [.] 0x0000000000000645
      
      2. Select the line of "10.59%  div      div               [.] 0x0000000000000628" and ENTER.
      
        Annotate 0x0000000000000628
        Zoom into div thread
        Zoom into div DSO (use the 'k' hotkey to zoom directly into the kernel)
        Browse map details
        Run scripts for samples of symbol [0x0000000000000628]
        Run scripts for all samples
        Switch to another data file in PWD
        Exit
      
      3. Select the "Annotate 0x0000000000000628" and ENTER.
      
      Percent│
             │
             │
             │     Disassembly of section .text:
             │
             │     0000000000000628 <.text+0x68>:
             │       divsd %xmm4,%xmm0
             │       divsd %xmm3,%xmm1
             │       movsd (%rsp),%xmm2
             │       addsd %xmm1,%xmm0
             │       addsd %xmm2,%xmm0
             │       movsd %xmm0,(%rsp)
      
      Now we can see the dump of object starting from 0x628.
      
       v5:
       ---
       Remove the hotkey 'a' implementation from this patch. It
       will be moved to a separate patch.
      
       v4:
       ---
       1. Support the hotkey 'a'. When we press 'a' on address,
          now it supports the annotation.
      
       2. Change the patch title from
          "Support interactive annotation of code without symbols" to
          "perf report: Support interactive annotation of code without symbols"
      
       v3:
       ---
       Keep just the ANNOTATION_DUMMY_LEN, and remove the
       opts->annotate_dummy_len since it's the "maybe in future
       we will provide" feature.
      
       v2:
       ---
       Fix a crash issue when annotating an address in "unknown" object.
      
       The steps to reproduce this issue:
      
       perf record -e cycles:u ls
       perf report
      
          75.29%  ls       ld-2.27.so        [.] do_lookup_x
          23.64%  ls       ld-2.27.so        [.] __GI___tunables_init
           1.04%  ls       [unknown]         [k] 0xffffffff85c01210
           0.03%  ls       ld-2.27.so        [.] _start
      
       When annotating 0xffffffff85c01210, the crash happens.
      
       v2 adds checking for ms->map in add_annotate_opt(). If the object is
       "unknown", ms->map is NULL.
      
      Committer notes:
      
      Renamed new_annotate_sym() to symbol__new_unresolved().
      
      Use PRIx64 to fix this issue in some 32-bit arches:
      
        ui/browsers/hists.c: In function 'symbol__new_unresolved':
        ui/browsers/hists.c:2474:38: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
          snprintf(name, sizeof(name), "%-#.*lx", BITS_PER_LONG / 4, addr);
                                        ~~~~~~^                      ~~~~
                                        %-#.*llx
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200227043939.4403-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7b0a0dcb
  6. 23 3月, 2020 3 次提交
    • J
      perf report: Print al_addr when symbol is not found · 443bc639
      Jin Yao 提交于
      For branch mode, if the symbol is not found, it prints
      the address.
      
      For example, 0x0000555eee0365a0 in below output.
      
        Overhead  Command  Source Shared Object  Source Symbol                            Target Symbol
          17.55%  div      libc-2.27.so          [.] __random                             [.] __random
           6.11%  div      div                   [.] 0x0000555eee0365a0                   [.] rand
           6.10%  div      libc-2.27.so          [.] rand                                 [.] 0x0000555eee036769
           5.80%  div      libc-2.27.so          [.] __random_r                           [.] __random
           5.72%  div      libc-2.27.so          [.] __random                             [.] __random_r
           5.62%  div      libc-2.27.so          [.] __random_r                           [.] __random_r
           5.38%  div      libc-2.27.so          [.] __random                             [.] rand
           4.56%  div      libc-2.27.so          [.] __random                             [.] __random
           4.49%  div      div                   [.] 0x0000555eee036779                   [.] 0x0000555eee0365ff
           4.25%  div      div                   [.] 0x0000555eee0365fa                   [.] 0x0000555eee036760
      
      But it's not very easy to understand what the instructions
      are in the binary. So this patch uses the al_addr instead.
      
      With this patch, the output is
      
        Overhead  Command  Source Shared Object  Source Symbol                            Target Symbol
          17.55%  div      libc-2.27.so          [.] __random                             [.] __random
           6.11%  div      div                   [.] 0x00000000000005a0                   [.] rand
           6.10%  div      libc-2.27.so          [.] rand                                 [.] 0x0000000000000769
           5.80%  div      libc-2.27.so          [.] __random_r                           [.] __random
           5.72%  div      libc-2.27.so          [.] __random                             [.] __random_r
           5.62%  div      libc-2.27.so          [.] __random_r                           [.] __random_r
           5.38%  div      libc-2.27.so          [.] __random                             [.] rand
           4.56%  div      libc-2.27.so          [.] __random                             [.] __random
           4.49%  div      div                   [.] 0x0000000000000779                   [.] 0x00000000000005ff
           4.25%  div      div                   [.] 0x00000000000005fa                   [.] 0x0000000000000760
      
      Now we can use objdump to dump the object starting from 0x5a0.
      
      For example,
      objdump -d --start-address 0x5a0 div
      
      00000000000005a0 <rand@plt>:
       5a0:   ff 25 2a 0a 20 00       jmpq   *0x200a2a(%rip)        # 200fd0 <__cxa_finalize@plt+0x200a20>
       5a6:   68 02 00 00 00          pushq  $0x2
       5ab:   e9 c0 ff ff ff          jmpq   570 <srand@plt-0x10>
       ...
      
      Committer testing:
      
        [root@seventh ~]# perf record -a -b sleep 1
        [root@seventh ~]# perf report --header-only | grep cpudesc
        # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
        [root@seventh ~]# perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        [root@seventh ~]#
      
      Before:
      
        [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 2K of event 'cycles'
        # Event count (approx.): 2240
        #
        # Overhead  Command          Source Shared Object      Source Symbol           Target Symbol           Basic Block Cycles
        # ........  ...............  ........................  ......................  ......................  ..................
        #
             0.13%  systemd-journal  libc-2.29.so              [.] cfree@GLIBC_2.2.5   [.] _int_free           1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465c82  [.] 0x00007fe406465d80  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465ded  [.] 0x00007fe406465c30  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465e4e  [.] 0x00007fe406465de0  1
             0.09%  systemd-journal  systemd-journald          [.] free@plt            [.] cfree@GLIBC_2.2.5   1
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           18
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           2
             0.04%  systemd          libsystemd-shared-241.so  [.] bus_resolve@plt     [.] bus_resolve         204
             0.04%  systemd          libsystemd-shared-241.so  [.] getpid_cached@plt   [.] getpid_cached       7
        [root@seventh ~]#
      
      After:
      
        [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 2K of event 'cycles'
        # Event count (approx.): 2240
        #
        # Overhead  Command          Source Shared Object      Source Symbol           Target Symbol           Basic Block Cycles
        # ........  ...............  ........................  ......................  ......................  ..................
        #
             0.13%  systemd-journal  libc-2.29.so              [.] cfree@GLIBC_2.2.5   [.] _int_free           1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7c82  [.] 0x00000000000f7d80  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7ded  [.] 0x00000000000f7c30  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7e4e  [.] 0x00000000000f7de0  1
             0.09%  systemd-journal  systemd-journald          [.] free@plt            [.] cfree@GLIBC_2.2.5   1
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           18
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           2
             0.04%  systemd          libsystemd-shared-241.so  [.] bus_resolve@plt     [.] bus_resolve         204
             0.04%  systemd          libsystemd-shared-241.so  [.] getpid_cached@plt   [.] getpid_cached       7
        [root@seventh ~]#
      
      Lets use -v to get full paths and then try objdump on the unresolved address:
      
        [root@seventh ~]# perf report -v --stdio --dso libsystemd-shared-241.so |& grep libsystemd-shared-241.so | tail -1
           0.04% systemd-journal /usr/lib/systemd/libsystemd-shared-241.so 0x80c1a B [.] 0x0000000000080c1a 0x80a95 B [.] 0x0000000000080a95 61
        [root@seventh ~]#
      
        [root@seventh ~]# objdump -d --start-address 0x00000000000f7d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20
      
        /usr/lib/systemd/libsystemd-shared-241.so:     file format elf64-x86-64
      
        Disassembly of section .text:
      
        00000000000f7d80 <proc_cmdline_parse_given@@SD_SHARED+0x330>:
           f7d80:	41 39 11             	cmp    %edx,(%r9)
           f7d83:	0f 84 ff fe ff ff    	je     f7c88 <proc_cmdline_parse_given@@SD_SHARED+0x238>
           f7d89:	4c 8d 05 97 09 0c 00 	lea    0xc0997(%rip),%r8        # 1b8727 <utf8_skip_data@@SD_SHARED+0x3147>
           f7d90:	b9 49 00 00 00       	mov    $0x49,%ecx
           f7d95:	48 8d 15 c9 f5 0b 00 	lea    0xbf5c9(%rip),%rdx        # 1b7365 <utf8_skip_data@@SD_SHARED+0x1d85>
           f7d9c:	31 ff                	xor    %edi,%edi
           f7d9e:	48 8d 35 9b ff 0b 00 	lea    0xbff9b(%rip),%rsi        # 1b7d40 <utf8_skip_data@@SD_SHARED+0x2760>
           f7da5:	e8 a6 d6 f4 ff       	callq  45450 <log_assert_failed_realm@plt>
           f7daa:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
           f7db0:	41 56                	push   %r14
           f7db2:	41 55                	push   %r13
           f7db4:	41 54                	push   %r12
           f7db6:	55                   	push   %rbp
        [root@seventh ~]#
      
      If we tried the the reported address before this patch:
      
        [root@seventh ~]# objdump -d --start-address 0x00007fe406465d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20
      
        /usr/lib/systemd/libsystemd-shared-241.so:     file format elf64-x86-64
      
        [root@seventh ~]#
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200227043939.4403-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      443bc639
    • L
      perf symbols: Consolidate symbol fixup issue · 7eec00a7
      Leo Yan 提交于
      After copying Arm64's perf archive with object files and perf.data file
      to x86 laptop, the x86's perf kernel symbol resolution fails.  It
      outputs 'unknown' for all symbols parsing.
      
      This issue is root caused by the function elf__needs_adjust_symbols(),
      x86 perf tool uses one weak version, Arm64 (and powerpc) has rewritten
      their own version.  elf__needs_adjust_symbols() decides if need to parse
      symbols with the relative offset address; but x86 building uses the weak
      function which misses to check for the elf type 'ET_DYN', so that it
      cannot parse symbols in Arm DSOs due to the wrong result from
      elf__needs_adjust_symbols().
      
      The DSO parsing should not depend on any specific architecture perf
      building; e.g. x86 perf tool can parse Arm and Arm64 DSOs, vice versa.
      And confirmed by Naveen N. Rao that powerpc64 kernels are not being
      built as ET_DYN anymore and change to ET_EXEC.
      
      This patch removes the arch specific functions for Arm64 and powerpc and
      changes elf__needs_adjust_symbols() as a common function.
      
      In the common elf__needs_adjust_symbols(), it checks an extra condition
      'ET_DYN' for elf header type.  With this fixing, the Arm64 DSO can be
      parsed properly with x86's perf tool.
      
      Before:
      
        # perf script
        main 3258 1 branches:                0 [unknown] ([unknown]) => ffff800010c4665c [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c46670 [unknown] ([kernel.kallsyms]) => ffff800010c4eaec [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eaec [unknown] ([kernel.kallsyms]) => ffff800010c4eb00 [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eb08 [unknown] ([kernel.kallsyms]) => ffff800010c4e780 [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4e7a0 [unknown] ([kernel.kallsyms]) => ffff800010c4eeac [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eebc [unknown] ([kernel.kallsyms]) => ffff800010c4ed80 [unknown] ([kernel.kallsyms])
      
      After:
      
        # perf script
        main 3258 1 branches:                0 [unknown] ([unknown]) => ffff800010c4665c coresight_timeout+0x54 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c46670 coresight_timeout+0x68 ([kernel.kallsyms]) => ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) => ffff800010c4eb00 etm4_enable_hw+0x3e0 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eb08 etm4_enable_hw+0x3e8 ([kernel.kallsyms]) => ffff800010c4e780 etm4_enable_hw+0x60 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4e7a0 etm4_enable_hw+0x80 ([kernel.kallsyms]) => ffff800010c4eeac etm4_enable+0x2d4 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eebc etm4_enable+0x2e4 ([kernel.kallsyms]) => ffff800010c4ed80 etm4_enable+0x1a8 ([kernel.kallsyms])
      
      v3: Changed to check for ET_DYN across all architectures.
      
      v2: Fixed Arm64 and powerpc native building.
      Reported-by: NMike Leach <mike.leach@linaro.org>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Allison Randal <allison@lohutok.net>
      Cc: Enrico Weigelt <info@metux.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20200306015759.10084-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7eec00a7
    • I
      perf parse-events: Fix 3 use after frees found with clang ASAN · d4953f7e
      Ian Rogers 提交于
      Reproducible with a clang asan build and then running perf test in
      particular 'Parse event definition strings'.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20200314170356.62914-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d4953f7e
  7. 18 3月, 2020 2 次提交
    • J
      perf expr: Fix copy/paste mistake · 59a08b4b
      Jiri Olsa 提交于
      Copy/paste leftover from recent refactor.
      
      Fixes: 26226a97 ("perf expr: Move expr lexer to flex")
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200315155609.603948-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      59a08b4b
    • I
      perf tools: Give synthetic mmap events an inode generation · 3b7a15b0
      Ian Rogers 提交于
      When mmap2 events are synthesized the ino_generation field isn't being
      set leading to uninitialized memory being compared.
      
      Caught with clang's -fsanitize=memory:
      
      ==124733==WARNING: MemorySanitizer: use-of-uninitialized-value
          #0 0x55a96a6a65cc in __dso_id__cmp tools/perf/util/dsos.c:23:6
          #1 0x55a96a6a81d5 in dso_id__cmp tools/perf/util/dsos.c:38:9
          #2 0x55a96a6a717f in __dso__cmp_long_name tools/perf/util/dsos.c:74:15
          #3 0x55a96a6a6c4c in __dsos__findnew_link_by_longname_id tools/perf/util/dsos.c:106:12
          #4 0x55a96a6a851e in __dsos__findnew_by_longname_id tools/perf/util/dsos.c:178:9
          #5 0x55a96a6a7798 in __dsos__find_id tools/perf/util/dsos.c:191:9
          #6 0x55a96a6a7b57 in __dsos__findnew_id tools/perf/util/dsos.c:251:20
          #7 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #8 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #9 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #10 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #11 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #12 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #13 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #14 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #15 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #16 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #17 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #18 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #19 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #20 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #21 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #22 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #23 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #24 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #25 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #26 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #27 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #28 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was stored to memory at
          #1 0x55a96a6a18f7 in dso__new_id tools/perf/util/dso.c:1230:14
          #2 0x55a96a6a78ee in __dsos__addnew_id tools/perf/util/dsos.c:233:20
          #3 0x55a96a6a7bcc in __dsos__findnew_id tools/perf/util/dsos.c:252:21
          #4 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #5 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #6 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #7 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #8 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #9 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #10 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #11 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #12 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #13 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #14 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #15 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #16 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #17 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #18 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #19 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
      
        Uninitialized value was stored to memory at
          #0 0x55a96a7725af in machine__process_mmap2_event tools/perf/util/machine.c:1646:25
          #1 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #2 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #3 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #4 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #5 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #6 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #7 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #8 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #9 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #10 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #11 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #12 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #13 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #14 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #15 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #16 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #17 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #18 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was created by a heap allocation
          #0 0x55a96a22f60d in malloc llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:925:3
          #1 0x55a96a882948 in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:655:15
          #2 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #3 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #4 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #5 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #6 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #7 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #8 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #9 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #10 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #11 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #12 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #13 0x55a96a282223 in main tools/perf/perf.c:538:3
      
      SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/perf/util/dsos.c:23:6 in __dso_id__cmp
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20200313053129.131264-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3b7a15b0
  8. 12 3月, 2020 1 次提交
  9. 11 3月, 2020 9 次提交
    • L
      perf cs-etm: Fix unsigned variable comparison to zero · bc010dd6
      Leo Yan 提交于
      The variable 'offset' in function cs_etm__sample() is u64 type, it's not
      appropriate to check it with 'while (offset > 0)'; this patch changes to
      'while (offset)'.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-6-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bc010dd6
    • L
      perf cs-etm: Optimize copying last branches · 695378b5
      Leo Yan 提交于
      If an instruction range packet can generate multiple instruction
      samples, these samples share the same last branches; it's not necessary
      to copy the same last branches repeatedly for these samples within the
      same packet.
      
      This patch moves out the last branches copying from function
      cs_etm__synth_instruction_sample(), and execute it prior to generating
      instruction samples.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      695378b5
    • L
      perf cs-etm: Correct synthesizing instruction samples · c9f5baa1
      Leo Yan 提交于
      When 'etm->instructions_sample_period' is less than
      'tidq->period_instructions', the function cs_etm__sample() cannot handle
      this case properly with its logic.
      
      Let's see below flow as an example:
      
      - If we set itrace option '--itrace=i4', then function cs_etm__sample()
        has variables with initialized values:
      
        tidq->period_instructions = 0
        etm->instructions_sample_period = 4
      
      - When the first packet is coming:
      
        packet->instr_count = 10; the number of instructions executed in this
        packet is 10, thus update period_instructions as below:
      
        tidq->period_instructions = 0 + 10 = 10
        instrs_over = 10 - 4 = 6
        offset = 10 - 6 - 1 = 3
        tidq->period_instructions = instrs_over = 6
      
      - When the second packet is coming:
      
        packet->instr_count = 10; in the second pass, assume 10 instructions
        in the trace sample again:
      
        tidq->period_instructions = 6 + 10 = 16
        instrs_over = 16 - 4 = 12
        offset = 10 - 12 - 1 = -3  -> the negative value
        tidq->period_instructions = instrs_over = 12
      
      So after handle these two packets, there have below issues:
      
      The first issue is that cs_etm__instr_addr() returns the address within
      the current trace sample of the instruction related to offset, so the
      offset is supposed to be always unsigned value.  But in fact, function
      cs_etm__sample() might calculate a negative offset value (in handling
      the second packet, the offset is -3) and pass to cs_etm__instr_addr()
      with u64 type with a big positive integer.
      
      The second issue is it only synthesizes 2 samples for sample period = 4.
      In theory, every packet has 10 instructions so the two packets have
      total 20 instructions, 20 instructions should generate 5 samples
      (4 x 5 = 20).  This is because cs_etm__sample() only calls once
      cs_etm__synth_instruction_sample() to generate instruction sample per
      range packet.
      
      This patch fixes the logic in function cs_etm__sample(); the basic
      idea for handling coming packet is:
      
      - To synthesize the first instruction sample, it combines the left
        instructions from the previous packet and the head of the new
        packet; then generate continuous samples with sample period;
      - At the tail of the new packet, if it has the rest instructions,
        these instructions will be left for the sequential sample.
      Suggested-by: NMike Leach <mike.leach@linaro.org>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c9f5baa1
    • L
      perf cs-etm: Continuously record last branch · f1410028
      Leo Yan 提交于
      Every time synthesize instruction sample, the last branch recording will
      be reset.  This is fine if the instruction period is big enough, for
      example if use the option '--itrace=i100000', the last branch array is
      reset for every sample with 100000 instructions per period; before
      generate the next instruction sample, there has the sufficient packets
      coming to fill the last branch array.
      
      On the other hand, if set a very small period, the packets will be
      significantly reduced between two continuous instruction samples, thus
      the last branch array is almost empty for new instruction sample by
      frequently resetting.
      
      To allow the last branches to work properly for any instruction periods,
      this patch avoids to reset the last branch for every instruction sample
      and only reset it when flush the trace data.  The last branches will be
      reset only for two cases, one is for trace starting, another case is for
      discontinuous trace; other cases can keep recording last branches for
      continuous instruction samples.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f1410028
    • L
      perf cs-etm: Swap packets for instruction samples · d0175156
      Leo Yan 提交于
      If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
      fails inject instruction samples; the root cause is the packets are only
      swapped for branch samples and last branches but not for instruction
      samples, so the new coming packets cannot be properly handled for only
      synthesizing instruction samples.
      
      To fix this issue, this patch refactors the code with a new function
      cs_etm__packet_swap() which is used to swap packets and adds the
      condition for instruction samples.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-2-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0175156
    • A
      perf map: Use strstarts() to look for Android libraries · bdadd647
      Arnaldo Carvalho de Melo 提交于
      And add the '/' to avoid looking at things like "/system/libsomething",
      when all we want to know if it is like "/system/lib/something", i.e. if
      it is in that system library dir.
      
      Using strstarts() avoids off-by-one errors like recently fixed in this
      file.
      
      Since this adds the '/' I separated this patch, another patch will make
      this consistent by removing other strncmp(str, prefix, manually
      calculated prefix length) usage.
      Reported-by: NDominik Czarnota <dominik.b.czarnota@gmail.com>
      Acked-by: NDominik Czarnota <dominik.b.czarnota@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: Link: http://lore.kernel.org/lkml/CABEVAa0_q-uC0vrrqpkqRHy_9RLOSXOJxizMLm1n5faHRy2AeA@mail.gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bdadd647
    • D
      perf map: Fix off by one in strncpy() size argument · b8fdcfb5
      disconnect3d 提交于
      This patch fixes an off-by-one error in strncpy size argument in
      tools/perf/util/map.c. The issue is that in:
      
              strncmp(filename, "/system/lib/", 11)
      
      the passed string literal: "/system/lib/" has 12 bytes (without the NULL
      byte) and the passed size argument is 11. As a result, the logic won't
      match the ending "/" byte and will pass filepaths that are stored in
      other directories e.g. "/system/libmalicious/bin" or just
      "/system/libmalicious".
      
      This functionality seems to be present only on Android. I assume the
      /system/ directory is only writable by the root user, so I don't think
      this bug has much (or any) security impact.
      
      Fixes: eca81836 ("perf tools: Add automatic remapping of Android libraries")
      Signed-off-by: Ndisconnect3d <dominik.b.czarnota@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Keeping <john@metanate.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Lentine <mlentine@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200309104855.3775-1-dominik.b.czarnota@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b8fdcfb5
    • K
      perf metricgroup: Support metric constraint · ab483d8b
      Kan Liang 提交于
      Some metric groups have metric constraints. A metric group can be
      scheduled as a group only when some constraints are applied.  For
      example, Page_Walks_Utilization has a metric constraint,
      "NO_NMI_WATCHDOG".
      
      When NMI watchdog is disabled, the metric group can be scheduled as a
      group. Otherwise, splitting the metric group into standalone metrics.
      
      Add a new function, metricgroup__has_constraint(), to check whether all
      constraints are applied. If not, splitting the metric group into
      standalone metrics.
      
      Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a
      warning for the metric group with the constraint, when NMI WATCHDOG is
      enabled.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab483d8b
    • K
      perf util: Factor out sysctl__nmi_watchdog_enabled() · 2a14c1bf
      Kan Liang 提交于
      The NMI watchdog status is required for metric group constraint
      examination.  Factor out sysctl__nmi_watchdog_enabled() to retrieve the
      NMI watchdog status.
      
      Users may count more than one metric group each time. If so, the NMI
      watchdog status may be retrieved several times. To reduce the overhead,
      cache the NMI watchdog status.
      
      Replace the NMI watchdog status checking in print_footer() by
      sysctl__nmi_watchdog_enabled().
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a14c1bf