1. 16 7月, 2021 15 次提交
  2. 15 7月, 2021 2 次提交
    • J
      perf cs-etm: Split Coresight decode by aux records · 83d1fc92
      James Clark 提交于
      Populate the auxtrace queues using AUX records rather than whole
      auxtrace buffers so that the decoder is reset between each aux record.
      
      This is similar to the auxtrace_queues__process_index() ->
      auxtrace_queues__add_indexed_event() flow where
      perf_session__peek_event() is used to read AUXTRACE events out of random
      positions in the file based on the auxtrace index.
      
      But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE
      buffers. For each PERF_RECORD_AUX event, we find the corresponding
      AUXTRACE buffer using the index, and add a fragment of that buffer to
      the auxtrace queues.
      
      No other changes to decoding were made, apart from populating the
      auxtrace queues. The result of decoding is identical to before, except
      in cases where decoding failed completely, due to not resetting the
      decoder.
      
      The reason for this change is because AUX records are emitted any time
      tracing is disabled, for example when the process is scheduled out.
      Because ETM was disabled and enabled again, the decoder also needs to be
      reset to force the search for a sync packet. Otherwise there would be
      fatal decoding errors.
      
      Testing
      =======
      
      Testing was done with the following script, to diff the decoding results
      between the patched and un-patched versions of perf:
      
      	#!/bin/bash
      	set -ex
      
      	$1 script -i $3 $4 > split.script
      	$2 script -i $3 $4 > default.script
      
      	diff split.script default.script | head -n 20
      
      And it was run like this, with various itrace options depending on the
      quantity of synthesised events:
      
      	compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns
      
      No changes in output were observed in the following scenarios:
      
      * Simple per-cpu
      	perf record -e cs_etm/@tmc_etr0/u top
      
      * Per-thread, single thread
      	perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C
      
      * Per-thread multiple threads (but only one thread collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-thread multiple threads (both threads collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-cpu explicit threads:
      	perf record -e cs_etm/@tmc_etr0/u --pid 853,854
      
      * System-wide (per-cpu):
          perf record -e cs_etm/@tmc_etr0/u -a
      
      * No data collected (no aux buffers)
      	Can happen with any command when run for a short period
      
      * Containing truncated records
      	Can happen with any command
      
      * Containing aux records with 0 size
      	Can happen with any command
      
      * Snapshot mode (various files with and without buffer wrap)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Some differences were observed in the following scenario:
      
      * Snapshot mode (with duplicate buffers)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Fewer samples are generated in snapshot mode if duplicate buffers
      were gathered because buffers with the same offset are now only added
      once. This gives different, but more correct results and no duplicate
      data is decoded any more.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210624164303.28632-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83d1fc92
    • A
      perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE) · d08c84e0
      Arnaldo Carvalho de Melo 提交于
      In fedora rawhide the PTHREAD_STACK_MIN define may end up expanded to a
      sysconf() call, and that will return 'long int', breaking the build:
      
          45 fedora:rawhide                : FAIL gcc version 11.1.1 20210623 (Red Hat 11.1.1-6) (GCC)
            builtin-sched.c: In function 'create_tasks':
            /git/perf-5.14.0-rc1/tools/include/linux/kernel.h:43:24: error: comparison of distinct pointer types lacks a cast [-Werror]
               43 |         (void) (&_max1 == &_max2);              \
                  |                        ^~
            builtin-sched.c:673:34: note: in expansion of macro 'max'
              673 |                         (size_t) max(16 * 1024, PTHREAD_STACK_MIN));
                  |                                  ^~~
            cc1: all warnings being treated as errors
      
        $ grep __sysconf /usr/include/*/*.h
        /usr/include/bits/pthread_stack_min-dynamic.h:extern long int __sysconf (int __name) __THROW;
        /usr/include/bits/pthread_stack_min-dynamic.h:#   define PTHREAD_STACK_MIN __sysconf (__SC_THREAD_STACK_MIN_VALUE)
        /usr/include/bits/time.h:extern long int __sysconf (int);
        /usr/include/bits/time.h:# define CLK_TCK ((__clock_t) __sysconf (2))	/* 2 is _SC_CLK_TCK */
        $
      
      So cast it to int to cope with that.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d08c84e0
  3. 14 7月, 2021 7 次提交
    • H
      libperf: Fix build error with LIBPFM4=1 · 50e98924
      Heiko Carstens 提交于
      Fix build error with LIBPFM4=1:
      
          CC      util/pfm.o
        util/pfm.c: In function ‘parse_libpfm_events_option’:
        util/pfm.c:102:30: error: ‘struct evsel’ has no member named ‘leader’
          102 |                         evsel->leader = grp_leader;
              |                              ^~
      
      Committer notes:
      
      There is this entry in 'make -C tools/perf build-test' to test the build
      with libpfm:
      
        $ grep libpfm tools/perf/tests/make
        make_with_libpfm4   := LIBPFM4=1
        run += make_with_libpfm4
        $
      
      But the test machine lacked libpfm-devel, now its installed and further
      cases like this shouldn't happen.
      
      Committer testing:
      
      Before this patch this fails, after applying it:
      
        $ make -C tools/perf build-test
        make: Entering directory '/var/home/acme/git/perf/tools/perf'
        - tarpkg: ./tests/perf-targz-src-pkg .
                         make_static: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 -j24  DESTDIR=/tmp/tmp.KzFSfvGRQa
        <SNIP>
                   make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                 make_with_libpfm4_O: make LIBPFM4=1
               make_install_prefix_O: make install prefix=/tmp/krava
                  make_no_auxtrace_O: make NO_AUXTRACE=1
        <SNIP>
        $ rpm -q libpfm-devel
        libpfm-devel-4.11.0-4.fc34.x86_64
        $
      
      FIXME:
      
      This shows a need for 'build-test' to bail out when a build option is
      specified that has no required library devel files installed.
      
      Fixes: fba7c866 ("libperf: Move 'leader' from tools/perf to perf_evsel::leader")
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210713091907.1555560-1-hca@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      50e98924
    • A
      tools headers UAPI: Sync files changed by the memfd_secret new syscall · 376a9476
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in this cset:
      
        7bb7f2ac ("arch, mm: wire up memfd_secret system call where relevant")
      
      That silences these perf build warnings and add support for those new
      syscalls in tools such as 'perf trace'.
      
      For instance, this is now possible:
      
        # perf trace -v -e memfd_secret
        event qualifier tracepoint filter: (common_pid != 13375 && common_pid != 3713) && (id == 447)
        ^C#
      
      That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
      tracepoints.
      
        $ grep memfd_secret tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
        447    common  memfd_secret            sys_memfd_secret
        $
      
      This addresses these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/unistd.h' differs from latest version at 'arch/arm64/include/uapi/asm/unistd.h'
        diff -u tools/arch/arm64/include/uapi/asm/unistd.h arch/arm64/include/uapi/asm/unistd.h
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
        diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
        Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
        diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      376a9476
    • J
      perf stat: Merge uncore events by default for hybrid platform · e0a7ef2a
      Jin Yao 提交于
      On a hybrid platform, by default 'perf stat' aggregates and reports the
      event counts per PMU. For example,
      
        # perf stat -e cycles -a true
      
         Performance counter stats for 'system wide':
      
                 1,400,445      cpu_core/cycles/
                   680,881      cpu_atom/cycles/
      
               0.001770773 seconds time elapsed
      
      But for uncore events that's not a suitable method. Uncore has nothing
      to do with hybrid. So for uncore events, we aggregate event counts from
      all PMUs and report the counts without PMUs.
      
      Before:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     2,058      uncore_arb_0/event=0x81,umask=0x1/
                     2,028      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000614498 seconds time elapsed
      
      After:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     3,996      arb/event=0x81,umask=0x1/
                         0      arb/event=0x84,umask=0x1/
      
               0.000630046 seconds time elapsed
      
      Of course, we also keep the '--no-merge' working for uncore events.
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
      
         Performance counter stats for 'system wide':
      
                     1,952      uncore_arb_0/event=0x81,umask=0x1/
                     1,921      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000575536 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0a7ef2a
    • J
      perf tests: Fix 'Convert perf time to TSC' on core-only system · de3d5fd8
      Jin Yao 提交于
      If the atom CPUs are offlined, the 'cpu_atom' is not valid.
      We don't need the test case for 'cpu_atom'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      de3d5fd8
    • J
      perf tests: Fix 'Roundtrip evsel->name' on core-only system · 212f3d97
      Jin Yao 提交于
      If the atom CPUs are offlined, the 'cpu_atom' is not valid.
      Perf will not create two events for one hw event, so the
      evsel->idx doesn't need to be divided by 2 before comparing.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      212f3d97
    • J
      perf tests: Fix 'Parse event definition strings' on core-only system · 490e9a8f
      Jin Yao 提交于
      If the atom CPUs are offlined, the 'cpu_atom' is not valid.
      We don't need the test case for 'cpu_atom'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      490e9a8f
    • J
      perf pmu: Skip invalid hybrid pmu · 49afa7f6
      Jin Yao 提交于
      On hybrid platform, such as Alderlake, if atom CPUs are offlined,
      the kernel still exports the sysfs path '/sys/devices/cpu_atom/' for
      'cpu_atom' pmu but the file '/sys/devices/cpu_atom/cpus' is empty,
      which indicates this is an invalid pmu.
      
      Need to check and skip the invalid hybrid pmu.
      
      Before:
      
        # perf list
        ...
        branch-instructions OR cpu_atom/branch-instructions/ [Kernel PMU event]
        branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event]
        branch-misses OR cpu_atom/branch-misses/           [Kernel PMU event]
        branch-misses OR cpu_core/branch-misses/           [Kernel PMU event]
        bus-cycles OR cpu_atom/bus-cycles/                 [Kernel PMU event]
        bus-cycles OR cpu_core/bus-cycles/                 [Kernel PMU event]
        ...
      
      The cpu_atom events are still displayed even if atom CPUs are offlined.
      
      After:
      
        # perf list
        ...
        branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event]
        branch-misses OR cpu_core/branch-misses/           [Kernel PMU event]
        bus-cycles OR cpu_core/bus-cycles/                 [Kernel PMU event]
        ...
      
      Now only cpu_core events are displayed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      49afa7f6
  4. 10 7月, 2021 8 次提交
  5. 07 7月, 2021 8 次提交
    • A
      perf intel-pt: Add a config for max loops without consuming a packet · b4b046ff
      Adrian Hunter 提交于
      The Intel PT decoder limits the number of unconditional branches (e.g.
      jmps) decoded without consuming any trace packets. Generally, a loop
      needs a conditional branch which generates a TNT packet, whereas a "ret"
      instruction will generate a TIP or TNT packet. So exceeding the limit is
      assumed to be a never-ending loop, which can happen if there has been a
      decoding error putting the decoder at the wrong place in the code.
      
      Up until now, the limit of 10000 has been enough but some analytic
      purposes have been reported to exceed that.
      
      Increase the limit to 100000, and make it configurable via perf config
      intel-pt.max-loops. Also amend the "Never-ending loop" message to
      mention the configuration entry.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20210701175132.3977-1-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b4b046ff
    • J
      perf stat: Disable the NMI watchdog message on hybrid · 493be70a
      Jin Yao 提交于
      If we run a single workload that only runs on big core, there is always
      a ugly message about disabling the NMI watchdog because the atom is not
      counted.
      
      Before:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.43 msec task-clock                #    0.396 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        45      page-faults               #  103.918 K/sec
                   639,634      cpu_core/cycles/          #    1.477 G/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   643,498      cpu_core/instructions/    #    1.486 G/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   123,715      cpu_core/branches/        #  285.694 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,094      cpu_core/branch-misses/   #    9.454 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001092407 seconds time elapsed
      
               0.001144000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.001904106 seconds time elapsed
      
               0.001947000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
        The events in group usually have to be from the same PMU. Try reorganizing the group.
      
      Now we disable the NMI watchdog message on hybrid, otherwise there
      are too many false positives.
      
      After:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.79 msec task-clock                #    0.419 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        48      page-faults               #   60.889 K/sec
                   777,692      cpu_core/cycles/          #  986.519 M/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   669,147      cpu_core/instructions/    #  848.828 M/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   128,635      cpu_core/branches/        #  163.176 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,089      cpu_core/branch-misses/   #    5.187 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001880649 seconds time elapsed
      
               0.001935000 seconds user
               0.000000000 seconds sys
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.000963319 seconds time elapsed
      
               0.000999000 seconds user
               0.000000000 seconds sys
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210610034557.29766-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      493be70a
    • K
      perf vendor events power10: Adds 24x7 nest metric events for power10 platform · a3cbcadf
      Kajol Jain 提交于
      Patch adds 24x7 nest metric events for POWER10.
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Tested-by: NNageswara R Sastry <rnsastry@linux.ibm.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20210628064935.163465-1-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a3cbcadf
    • K
      perf script python: Fix buffer size to report iregs in perf script · dea8cfcc
      Kajol Jain 提交于
      Commit 48a1f565 ("perf script python: Add more PMU fields to
      event handler dict") added functionality to report fields like weight,
      iregs, uregs etc via perf report.  That commit predefined buffer size to
      512 bytes to print those fields.
      
      But in PowerPC, since we added extended regs support in:
      
        068aeea3 ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs")
        d735599a ("powerpc/perf: Add extended regs support for power10 platform")
      
      Now iregs can carry more bytes of data and this predefined buffer size
      can result to data loss in perf script output.
      
      This patch resolves this issue by making the buffer size dynamic, based
      on the number of registers needed to print. It also changes the
      regs_map() return type from int to void, as it is not being used by the
      set_regs_in_dict(), its only caller.
      
      Fixes: 068aeea3 ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs")
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Tested-by: NNageswara R Sastry <rnsastry@linux.ibm.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20210628062341.155839-1-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dea8cfcc
    • J
      perf trace: Fix the perf trace link location · e63cbfa3
      Justin M. Forbes 提交于
      The install perf_dlfilter.h patch included what seems to be a typo in
      the Makefile.perf, which changed the location of the trace link from
      '$(DESTDIR_SQ)$(bindir_SQ)/trace' to '$(DESTDIR_SQ)$(dir_SQ)/trace'.
      
      This reverts it back to the correct location.
      
      Fixes: 0beb2183 ("perf build: Install perf_dlfilter.h")
      Signed-off-by: NJustin M. Forbes <jforbes@fedoraproject.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Justin M. Forbes <jmforbes@linuxtx.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210706185952.116121-1-jforbes@fedoraproject.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e63cbfa3
    • R
      perf top: Fix overflow in elf_sec__is_text() · 83952286
      Riccardo Mancini 提交于
      ASan reports a heap-buffer-overflow in elf_sec__is_text when using perf-top.
      
      The bug is caused by the fact that secstrs is built from runtime_ss, while
      shdr is built from syms_ss if shdr.sh_type != SHT_NOBITS. Therefore, they
      point to two different ELF files.
      
      This patch renames secstrs to secstrs_run and adds secstrs_sym, so that
      the correct secstrs is chosen depending on shdr.sh_type.
      
        $ ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1 ./perf top
        =================================================================
        ==363148==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61300009add6 at pc 0x00000049875c bp 0x7f4f56446440 sp 0x7f4f56445bf0
        READ of size 1 at 0x61300009add6 thread T6
          #0 0x49875b in StrstrCheck(void*, char*, char const*, char const*) (/home/user/linux/tools/perf/perf+0x49875b)
          #1 0x4d13a2 in strstr (/home/user/linux/tools/perf/perf+0x4d13a2)
          #2 0xacae36 in elf_sec__is_text /home/user/linux/tools/perf/util/symbol-elf.c:176:9
          #3 0xac3ec9 in elf_sec__filter /home/user/linux/tools/perf/util/symbol-elf.c:187:9
          #4 0xac2c3d in dso__load_sym /home/user/linux/tools/perf/util/symbol-elf.c:1254:20
          #5 0x883981 in dso__load /home/user/linux/tools/perf/util/symbol.c:1897:9
          #6 0x8e6248 in map__load /home/user/linux/tools/perf/util/map.c:332:7
          #7 0x8e66e5 in map__find_symbol /home/user/linux/tools/perf/util/map.c:366:6
          #8 0x7f8278 in machine__resolve /home/user/linux/tools/perf/util/event.c:707:13
          #9 0x5f3d1a in perf_event__process_sample /home/user/linux/tools/perf/builtin-top.c:773:6
          #10 0x5f30e4 in deliver_event /home/user/linux/tools/perf/builtin-top.c:1197:3
          #11 0x908a72 in do_flush /home/user/linux/tools/perf/util/ordered-events.c:244:9
          #12 0x905fae in __ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:323:8
          #13 0x9058db in ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:341:9
          #14 0x5f19b1 in process_thread /home/user/linux/tools/perf/builtin-top.c:1109:7
          #15 0x7f4f6a21a298 in start_thread /usr/src/debug/glibc-2.33-16.fc34.x86_64/nptl/pthread_create.c:481:8
          #16 0x7f4f697d0352 in clone ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
      0x61300009add6 is located 10 bytes to the right of 332-byte region [0x61300009ac80,0x61300009adcc)
      allocated by thread T6 here:
      
          #0 0x4f3f7f in malloc (/home/user/linux/tools/perf/perf+0x4f3f7f)
          #1 0x7f4f6a0a88d9  (/lib64/libelf.so.1+0xa8d9)
      
      Thread T6 created by T0 here:
      
          #0 0x464856 in pthread_create (/home/user/linux/tools/perf/perf+0x464856)
          #1 0x5f06e0 in __cmd_top /home/user/linux/tools/perf/builtin-top.c:1309:6
          #2 0x5ef19f in cmd_top /home/user/linux/tools/perf/builtin-top.c:1762:11
          #3 0x7b28c0 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
          #4 0x7b119f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
          #5 0x7b2423 in run_argv /home/user/linux/tools/perf/perf.c:409:2
          #6 0x7b0c19 in main /home/user/linux/tools/perf/perf.c:539:3
          #7 0x7f4f696f7b74 in __libc_start_main /usr/src/debug/glibc-2.33-16.fc34.x86_64/csu/../csu/libc-start.c:332:16
      
        SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/user/linux/tools/perf/perf+0x49875b) in StrstrCheck(void*, char*, char const*, char const*)
        Shadow bytes around the buggy address:
          0x0c268000b560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0c268000b5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        =>0x0c268000b5b0: 00 00 00 00 00 00 00 00 00 04[fa]fa fa fa fa fa
          0x0c268000b5c0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
          0x0c268000b5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0c268000b5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0c268000b5f0: 07 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        Shadow byte legend (one shadow byte represents 8 application bytes):
          Addressable:           00
          Partially addressable: 01 02 03 04 05 06 07
          Heap left redzone:       fa
          Freed heap region:       fd
          Stack left redzone:      f1
          Stack mid redzone:       f2
          Stack right redzone:     f3
          Stack after return:      f5
          Stack use after scope:   f8
          Global redzone:          f9
          Global init order:       f6
          Poisoned by user:        f7
          Container overflow:      fc
          Array cookie:            ac
          Intra object redzone:    bb
          ASan internal:           fe
          Left alloca redzone:     ca
          Right alloca redzone:    cb
          Shadow gap:              cc
        ==363148==ABORTING
      Suggested-by: NJiri Slaby <jirislaby@kernel.org>
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Fabian Hemmer <copy@copy.sh>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Link: http://lore.kernel.org/lkml/20210621222108.196219-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83952286
    • R
      perf annotate: Fix 's' on source line when disasm is empty · 5a4451e4
      Riccardo Mancini 提交于
      If the disasm is empty, 's' should fail. Instead it seemingly works,
      hiding the empty lines and causing an assertion error on the next time
      annotate is called (from within perf report).
      
      The problem is caused by a buffer overflow, caused by a wrong exit
      condition in annotate_browser__find_next_asm_line, which checks
      browser->b.top instead of browser->b.entries.
      
      This patch fixes the issue, making annotate_browser__toggle_source
      fail if the disasm is empty (nothing happens to the user).
      
      Fixes: 6de249d6 ("perf annotate: Allow 's' on source code lines")
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210705161524.72953-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5a4451e4
    • M
      perf probe: Do not show @plt function by default · d5882a92
      Masami Hiramatsu 提交于
      Fix the perf-probe --functions option do not show the PLT
      stub symbols (*@plt) by default.
      
        -----
        $ ./perf probe -x /usr/lib64/libc-2.33.so -F | head
        a64l
        abort
        abs
        accept
        accept4
        access
        acct
        addmntent
        addseverity
        adjtime
        -----
      Reported-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NMasami Hiramatsu <mhriamat@kernel.org>
      Acked-by: NThomas Richter <tmricht@linux.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Stefan Liebler <stli@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/162532653450.393143.12621329879630677469.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d5882a92