1. 16 7月, 2021 15 次提交
  2. 15 7月, 2021 3 次提交
    • J
      perf cs-etm: Split Coresight decode by aux records · 83d1fc92
      James Clark 提交于
      Populate the auxtrace queues using AUX records rather than whole
      auxtrace buffers so that the decoder is reset between each aux record.
      
      This is similar to the auxtrace_queues__process_index() ->
      auxtrace_queues__add_indexed_event() flow where
      perf_session__peek_event() is used to read AUXTRACE events out of random
      positions in the file based on the auxtrace index.
      
      But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE
      buffers. For each PERF_RECORD_AUX event, we find the corresponding
      AUXTRACE buffer using the index, and add a fragment of that buffer to
      the auxtrace queues.
      
      No other changes to decoding were made, apart from populating the
      auxtrace queues. The result of decoding is identical to before, except
      in cases where decoding failed completely, due to not resetting the
      decoder.
      
      The reason for this change is because AUX records are emitted any time
      tracing is disabled, for example when the process is scheduled out.
      Because ETM was disabled and enabled again, the decoder also needs to be
      reset to force the search for a sync packet. Otherwise there would be
      fatal decoding errors.
      
      Testing
      =======
      
      Testing was done with the following script, to diff the decoding results
      between the patched and un-patched versions of perf:
      
      	#!/bin/bash
      	set -ex
      
      	$1 script -i $3 $4 > split.script
      	$2 script -i $3 $4 > default.script
      
      	diff split.script default.script | head -n 20
      
      And it was run like this, with various itrace options depending on the
      quantity of synthesised events:
      
      	compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns
      
      No changes in output were observed in the following scenarios:
      
      * Simple per-cpu
      	perf record -e cs_etm/@tmc_etr0/u top
      
      * Per-thread, single thread
      	perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C
      
      * Per-thread multiple threads (but only one thread collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-thread multiple threads (both threads collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-cpu explicit threads:
      	perf record -e cs_etm/@tmc_etr0/u --pid 853,854
      
      * System-wide (per-cpu):
          perf record -e cs_etm/@tmc_etr0/u -a
      
      * No data collected (no aux buffers)
      	Can happen with any command when run for a short period
      
      * Containing truncated records
      	Can happen with any command
      
      * Containing aux records with 0 size
      	Can happen with any command
      
      * Snapshot mode (various files with and without buffer wrap)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Some differences were observed in the following scenario:
      
      * Snapshot mode (with duplicate buffers)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Fewer samples are generated in snapshot mode if duplicate buffers
      were gathered because buffers with the same offset are now only added
      once. This gives different, but more correct results and no duplicate
      data is decoded any more.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210624164303.28632-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83d1fc92
    • A
      tools headers: Remove broken definition of __LITTLE_ENDIAN · fa2c02e5
      Arnaldo Carvalho de Melo 提交于
      The linux/kconfig.h file was copied from the kernel but the line where
      with the generated/autoconf.h include from where the CONFIG_ entries
      would come from was deleted, as tools/ build system don't create that
      file, so we ended up always defining just __LITTLE_ENDIAN as
      CONFIG_CPU_BIG_ENDIAN was nowhere to be found.
      
      This in turn ended up breaking the build in some systems where
      __LITTLE_ENDIAN was already defined, such as the androind NDK.
      
      So just ditch that block that depends on the CONFIG_CPU_BIG_ENDIAN
      define.
      
      The kconfig.h file was copied just to get IS_ENABLED() and a
      'make -C tools/all' doesn't breaks with this removal.
      
      Fixes: 93281c4a ("x86/insn: Add an insn_decode() API")
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/YO8hK7lqJcIWuBzx@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fa2c02e5
    • A
      perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE) · d08c84e0
      Arnaldo Carvalho de Melo 提交于
      In fedora rawhide the PTHREAD_STACK_MIN define may end up expanded to a
      sysconf() call, and that will return 'long int', breaking the build:
      
          45 fedora:rawhide                : FAIL gcc version 11.1.1 20210623 (Red Hat 11.1.1-6) (GCC)
            builtin-sched.c: In function 'create_tasks':
            /git/perf-5.14.0-rc1/tools/include/linux/kernel.h:43:24: error: comparison of distinct pointer types lacks a cast [-Werror]
               43 |         (void) (&_max1 == &_max2);              \
                  |                        ^~
            builtin-sched.c:673:34: note: in expansion of macro 'max'
              673 |                         (size_t) max(16 * 1024, PTHREAD_STACK_MIN));
                  |                                  ^~~
            cc1: all warnings being treated as errors
      
        $ grep __sysconf /usr/include/*/*.h
        /usr/include/bits/pthread_stack_min-dynamic.h:extern long int __sysconf (int __name) __THROW;
        /usr/include/bits/pthread_stack_min-dynamic.h:#   define PTHREAD_STACK_MIN __sysconf (__SC_THREAD_STACK_MIN_VALUE)
        /usr/include/bits/time.h:extern long int __sysconf (int);
        /usr/include/bits/time.h:# define CLK_TCK ((__clock_t) __sysconf (2))	/* 2 is _SC_CLK_TCK */
        $
      
      So cast it to int to cope with that.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d08c84e0
  3. 14 7月, 2021 7 次提交
    • H
      libperf: Fix build error with LIBPFM4=1 · 50e98924
      Heiko Carstens 提交于
      Fix build error with LIBPFM4=1:
      
          CC      util/pfm.o
        util/pfm.c: In function ‘parse_libpfm_events_option’:
        util/pfm.c:102:30: error: ‘struct evsel’ has no member named ‘leader’
          102 |                         evsel->leader = grp_leader;
              |                              ^~
      
      Committer notes:
      
      There is this entry in 'make -C tools/perf build-test' to test the build
      with libpfm:
      
        $ grep libpfm tools/perf/tests/make
        make_with_libpfm4   := LIBPFM4=1
        run += make_with_libpfm4
        $
      
      But the test machine lacked libpfm-devel, now its installed and further
      cases like this shouldn't happen.
      
      Committer testing:
      
      Before this patch this fails, after applying it:
      
        $ make -C tools/perf build-test
        make: Entering directory '/var/home/acme/git/perf/tools/perf'
        - tarpkg: ./tests/perf-targz-src-pkg .
                         make_static: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 -j24  DESTDIR=/tmp/tmp.KzFSfvGRQa
        <SNIP>
                   make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                 make_with_libpfm4_O: make LIBPFM4=1
               make_install_prefix_O: make install prefix=/tmp/krava
                  make_no_auxtrace_O: make NO_AUXTRACE=1
        <SNIP>
        $ rpm -q libpfm-devel
        libpfm-devel-4.11.0-4.fc34.x86_64
        $
      
      FIXME:
      
      This shows a need for 'build-test' to bail out when a build option is
      specified that has no required library devel files installed.
      
      Fixes: fba7c866 ("libperf: Move 'leader' from tools/perf to perf_evsel::leader")
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210713091907.1555560-1-hca@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      50e98924
    • A
      tools headers UAPI: Sync files changed by the memfd_secret new syscall · 376a9476
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in this cset:
      
        7bb7f2ac ("arch, mm: wire up memfd_secret system call where relevant")
      
      That silences these perf build warnings and add support for those new
      syscalls in tools such as 'perf trace'.
      
      For instance, this is now possible:
      
        # perf trace -v -e memfd_secret
        event qualifier tracepoint filter: (common_pid != 13375 && common_pid != 3713) && (id == 447)
        ^C#
      
      That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
      tracepoints.
      
        $ grep memfd_secret tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
        447    common  memfd_secret            sys_memfd_secret
        $
      
      This addresses these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/unistd.h' differs from latest version at 'arch/arm64/include/uapi/asm/unistd.h'
        diff -u tools/arch/arm64/include/uapi/asm/unistd.h arch/arm64/include/uapi/asm/unistd.h
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
        diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
        Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
        diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      376a9476
    • J
      perf stat: Merge uncore events by default for hybrid platform · e0a7ef2a
      Jin Yao 提交于
      On a hybrid platform, by default 'perf stat' aggregates and reports the
      event counts per PMU. For example,
      
        # perf stat -e cycles -a true
      
         Performance counter stats for 'system wide':
      
                 1,400,445      cpu_core/cycles/
                   680,881      cpu_atom/cycles/
      
               0.001770773 seconds time elapsed
      
      But for uncore events that's not a suitable method. Uncore has nothing
      to do with hybrid. So for uncore events, we aggregate event counts from
      all PMUs and report the counts without PMUs.
      
      Before:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     2,058      uncore_arb_0/event=0x81,umask=0x1/
                     2,028      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000614498 seconds time elapsed
      
      After:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     3,996      arb/event=0x81,umask=0x1/
                         0      arb/event=0x84,umask=0x1/
      
               0.000630046 seconds time elapsed
      
      Of course, we also keep the '--no-merge' working for uncore events.
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
      
         Performance counter stats for 'system wide':
      
                     1,952      uncore_arb_0/event=0x81,umask=0x1/
                     1,921      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000575536 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0a7ef2a
    • J
      perf tests: Fix 'Convert perf time to TSC' on core-only system · de3d5fd8
      Jin Yao 提交于
      If the atom CPUs are offlined, the 'cpu_atom' is not valid.
      We don't need the test case for 'cpu_atom'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      de3d5fd8
    • J
      perf tests: Fix 'Roundtrip evsel->name' on core-only system · 212f3d97
      Jin Yao 提交于
      If the atom CPUs are offlined, the 'cpu_atom' is not valid.
      Perf will not create two events for one hw event, so the
      evsel->idx doesn't need to be divided by 2 before comparing.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      212f3d97
    • J
      perf tests: Fix 'Parse event definition strings' on core-only system · 490e9a8f
      Jin Yao 提交于
      If the atom CPUs are offlined, the 'cpu_atom' is not valid.
      We don't need the test case for 'cpu_atom'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      490e9a8f
    • J
      perf pmu: Skip invalid hybrid pmu · 49afa7f6
      Jin Yao 提交于
      On hybrid platform, such as Alderlake, if atom CPUs are offlined,
      the kernel still exports the sysfs path '/sys/devices/cpu_atom/' for
      'cpu_atom' pmu but the file '/sys/devices/cpu_atom/cpus' is empty,
      which indicates this is an invalid pmu.
      
      Need to check and skip the invalid hybrid pmu.
      
      Before:
      
        # perf list
        ...
        branch-instructions OR cpu_atom/branch-instructions/ [Kernel PMU event]
        branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event]
        branch-misses OR cpu_atom/branch-misses/           [Kernel PMU event]
        branch-misses OR cpu_core/branch-misses/           [Kernel PMU event]
        bus-cycles OR cpu_atom/bus-cycles/                 [Kernel PMU event]
        bus-cycles OR cpu_core/bus-cycles/                 [Kernel PMU event]
        ...
      
      The cpu_atom events are still displayed even if atom CPUs are offlined.
      
      After:
      
        # perf list
        ...
        branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event]
        branch-misses OR cpu_core/branch-misses/           [Kernel PMU event]
        bus-cycles OR cpu_core/bus-cycles/                 [Kernel PMU event]
        ...
      
      Now only cpu_core events are displayed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      49afa7f6
  4. 10 7月, 2021 11 次提交
  5. 09 7月, 2021 4 次提交