1. 25 5月, 2021 1 次提交
    • A
      perf script: Find script file relative to exec path · 6ea4b5db
      Adrian Hunter 提交于
      Allow perf script to find a script in the exec path.
      
      Example:
      
      Before:
      
       $ perf record -a -e intel_pt/branch=0/ sleep 0.1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.954 MB perf.data ]
       $ perf script intel-pt-events.py 2>&1 | head -3
         Error: Couldn't find script `intel-pt-events.py'
         See perf script -l for available scripts.
       $ perf script -s intel-pt-events.py 2>&1 | head -3
       Can't open python script "intel-pt-events.py": No such file or directory
       $ perf script ~/libexec/perf-core/scripts/python/intel-pt-events.py 2>&1 | head -3
         Error: Couldn't find script `/home/ahunter/libexec/perf-core/scripts/python/intel-pt-events.py'
         See perf script -l for available scripts.
       $
      
      After:
      
       $ perf script intel-pt-events.py 2>&1 | head -3
       Intel PT Power Events and PTWRITE
                  perf  8123/8123  [000]       551.230753986     cbr:  42  freq: 4219 MHz  (156%)                0 [unknown] ([unknown])
                  perf  8123/8123  [001]       551.230808216     cbr:  42  freq: 4219 MHz  (156%)                0 [unknown] ([unknown])
       $ perf script -s intel-pt-events.py 2>&1 | head -3
       Intel PT Power Events and PTWRITE
                  perf  8123/8123  [000]       551.230753986     cbr:  42  freq: 4219 MHz  (156%)                0 [unknown] ([unknown])
                  perf  8123/8123  [001]       551.230808216     cbr:  42  freq: 4219 MHz  (156%)                0 [unknown] ([unknown])
       $ perf script ~/libexec/perf-core/scripts/python/intel-pt-events.py 2>&1 | head -3
       Intel PT Power Events and PTWRITE
                  perf  8123/8123  [000]       551.230753986     cbr:  42  freq: 4219 MHz  (156%)                0 [unknown] ([unknown])
                  perf  8123/8123  [001]       551.230808216     cbr:  42  freq: 4219 MHz  (156%)                0 [unknown] ([unknown])
       $
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20210524065718.11421-1-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6ea4b5db
  2. 22 5月, 2021 2 次提交
  3. 21 5月, 2021 1 次提交
  4. 20 5月, 2021 1 次提交
    • N
      perf tools: Add 'cgroup-switches' software event · fb6c79d7
      Namhyung Kim 提交于
      It counts how often cgroups are changed actually during the context
      switches.
      
        # perf stat -a -e context-switches,cgroup-switches -a sleep 1
      
         Performance counter stats for 'system wide':
      
                    11,267      context-switches
                    10,950      cgroup-switches
      
               1.015634369 seconds time elapsed
      
      Committer notes:
      
      The kernel patches landed in v5.13, but this entry wasn't filled in
      perf's parse-events tables, which was leading to a segfault when running
      'perf list' on a kernel with that feature, as reported by Thomas
      Richter.
      
      Also removed the part touching tools/include/uapi/linux/perf_event.h as
      it was updated in the usual sync with the kernel UAPI headers, in a
      previous, already upstream, patch.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20210210083327.22726-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fb6c79d7
  5. 19 5月, 2021 4 次提交
    • A
      perf intel-pt: Remove redundant setting of ptq->insn_len · 0a0c5972
      Adrian Hunter 提交于
      Remove redundant "ptq->insn_len = 0" statement.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20210519074515.9262-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0a0c5972
    • A
      perf intel-pt: Fix sample instruction bytes · c954eb72
      Adrian Hunter 提交于
      The decoder reports the current instruction if it was decoded. In some
      cases the current instruction is not decoded, in which case the instruction
      bytes length must be set to zero. Ensure that is always done.
      
      Note perf script can anyway get the instruction bytes for any samples where
      they are not present.
      
      Also note, that there is a redundant "ptq->insn_len = 0" statement which is
      not removed until a subsequent patch in order to make this patch apply
      cleanly to stable branches.
      
      Example:
      
      A machne that supports TSX is required. It will have flag "rtm". Kernel
      parameter tsx=on may be required.
      
       # for w in `cat /proc/cpuinfo | grep -m1 flags `;do echo $w | grep rtm ; done
       rtm
      
      Test program:
      
       #include <stdio.h>
       #include <immintrin.h>
      
       int main()
       {
              int x = 0;
      
              if (_xbegin() == _XBEGIN_STARTED) {
                      x = 1;
                      _xabort(1);
              } else {
                      printf("x = %d\n", x);
              }
              return 0;
       }
      
      Compile with -mrtm i.e.
      
       gcc -Wall -Wextra -mrtm xabort.c -o xabort
      
      Record:
      
       perf record -e intel_pt/cyc/u --filter 'filter main @ ./xabort' ./xabort
      
      Before:
      
       # perf script --itrace=xe -F+flags,+insn,-period --xed --ns
                xabort  1478 [007] 92161.431348581:   transactions:   x                              400b81 main+0x14 (/root/xabort)          mov $0xffffffff, %eax
                xabort  1478 [007] 92161.431348624:   transactions:   tx abrt                        400b93 main+0x26 (/root/xabort)          mov $0xffffffff, %eax
      
      After:
      
       # perf script --itrace=xe -F+flags,+insn,-period --xed --ns
                xabort  1478 [007] 92161.431348581:   transactions:   x                              400b81 main+0x14 (/root/xabort)          xbegin 0x6
                xabort  1478 [007] 92161.431348624:   transactions:   tx abrt                        400b93 main+0x26 (/root/xabort)          xabort $0x1
      
      Fixes: faaa8768 ("perf intel-pt/bts: Report instruction bytes and length in sample")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210519074515.9262-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c954eb72
    • A
      perf intel-pt: Fix transaction abort handling · cb798783
      Adrian Hunter 提交于
      When adding support for power events, some handling of FUP packets was
      unified. That resulted in breaking reporting of TSX aborts, by not
      considering the associated TIP packet. Fix that.
      
      Example:
      
      A machine that supports TSX is required. It will have flag "rtm". Kernel
      parameter tsx=on may be required.
      
       # for w in `cat /proc/cpuinfo | grep -m1 flags `;do echo $w | grep rtm ; done
       rtm
      
      Test program:
      
       #include <stdio.h>
       #include <immintrin.h>
      
       int main()
       {
              int x = 0;
      
              if (_xbegin() == _XBEGIN_STARTED) {
                      x = 1;
                      _xabort(1);
              } else {
                      printf("x = %d\n", x);
              }
              return 0;
       }
      
      Compile with -mrtm i.e.
      
       gcc -Wall -Wextra -mrtm xabort.c -o xabort
      
      Record:
      
       perf record -e intel_pt/cyc/u --filter 'filter main @ ./xabort' ./xabort
      
      Before:
      
       # perf script --itrace=be -F+flags,+addr,-period,-event --ns
                xabort  1478 [007] 92161.431348552:   tr strt                             0 [unknown] ([unknown]) =>           400b6d main+0x0 (/root/xabort)
                xabort  1478 [007] 92161.431348624:   jmp                            400b96 main+0x29 (/root/xabort) =>           400bae main+0x41 (/root/xabort)
                xabort  1478 [007] 92161.431348624:   return                         400bb4 main+0x47 (/root/xabort) =>           400b87 main+0x1a (/root/xabort)
                xabort  1478 [007] 92161.431348637:   jcc                            400b8a main+0x1d (/root/xabort) =>           400b98 main+0x2b (/root/xabort)
                xabort  1478 [007] 92161.431348644:   tr end  call                   400ba9 main+0x3c (/root/xabort) =>           40f690 printf+0x0 (/root/xabort)
                xabort  1478 [007] 92161.431360859:   tr strt                             0 [unknown] ([unknown]) =>           400bae main+0x41 (/root/xabort)
                xabort  1478 [007] 92161.431360882:   tr end  return                 400bb4 main+0x47 (/root/xabort) =>           401139 __libc_start_main+0x309 (/root/xabort)
      
      After:
      
       # perf script --itrace=be -F+flags,+addr,-period,-event --ns
                xabort  1478 [007] 92161.431348552:   tr strt                             0 [unknown] ([unknown]) =>           400b6d main+0x0 (/root/xabort)
                xabort  1478 [007] 92161.431348624:   tx abrt                        400b93 main+0x26 (/root/xabort) =>           400b87 main+0x1a (/root/xabort)
                xabort  1478 [007] 92161.431348637:   jcc                            400b8a main+0x1d (/root/xabort) =>           400b98 main+0x2b (/root/xabort)
                xabort  1478 [007] 92161.431348644:   tr end  call                   400ba9 main+0x3c (/root/xabort) =>           40f690 printf+0x0 (/root/xabort)
                xabort  1478 [007] 92161.431360859:   tr strt                             0 [unknown] ([unknown]) =>           400bae main+0x41 (/root/xabort)
                xabort  1478 [007] 92161.431360882:   tr end  return                 400bb4 main+0x47 (/root/xabort) =>           401139 __libc_start_main+0x309 (/root/xabort)
      
      Fixes: a472e65f ("perf intel-pt: Add decoder support for ptwrite and power event packets")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210519074515.9262-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cb798783
    • T
      perf test: Fix libpfm4 support (63) test error for nested event groups · 316a76a5
      Thomas Richter 提交于
      Compiling perf with make LIBPFM4=1 includes libpfm support and
      enables test case 63 'Test libpfm4 support'. This test reports an error
      on all platforms for subtest 63.2 'test groups of --pfm-events'.
      The reported error message is 'nested event groups not supported'
      
       # ./perf test -F 63
       63: Test libpfm4 support                                            :
       63.1: test of individual --pfm-events                               :
       Error:
       failed to parse event stereolab : event not found
       Error:
       failed to parse event stereolab,instructions : event not found
       Error:
       failed to parse event instructions,stereolab : event not found
        Ok
       63.2: test groups of --pfm-events                                   :
       Error:
       nested event groups not supported    <------ Error message here
       Error:
       failed to parse event {stereolab} : event not found
       Error:
       failed to parse event {instructions,cycles},{instructions,stereolab} :\
      	 event not found
       Ok
       #
      
      This patch addresses the error message 'nested event groups not supported'.
      The root cause is function parse_libpfm_events_option() which parses the
      event string '{},{instructions}' and can not handle a leading empty
      group notation '{},...'.
      
      The code detects the first (empty) group indicator '{' but does not
      terminate group processing on the following group closing character '}'.
      So when the second group indicator '{' is detected, the code assumes
      a nested group and returns an error.
      
      With the error message fixed, also change the expected event number to
      one for the test case to succeed.
      
      While at it also fix a memory leak. In good case the function does not
      free the duplicated string given as first parameter.
      
      Output after:
       # ./perf test -F 63
       63: Test libpfm4 support                                            :
       63.1: test of individual --pfm-events                               :
       Error:
       failed to parse event stereolab : event not found
       Error:
       failed to parse event stereolab,instructions : event not found
       Error:
       failed to parse event instructions,stereolab : event not found
        Ok
       63.2: test groups of --pfm-events                                   :
       Error:
       failed to parse event {stereolab} : event not found
       Error:
       failed to parse event {instructions,cycles},{instructions,stereolab} : \
      	 event not found
        Ok
       #
      Error message 'nested event groups not supported' is gone.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Acked-By: NIan Rogers <irogers@google.com>
      Acked-by: NSumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20210517140931.2559364-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      316a76a5
  6. 17 5月, 2021 5 次提交
    • J
      perf cs-etm: Prevent and warn on underflows during timestamp calculation. · c1a6165a
      James Clark 提交于
      When a zero timestamp is encountered, warn once. This is to make
      hardware or configuration issues visible. Also suggest that the issue
      can be worked around with the --itrace=Z option.
      
      When an underflow with a non-zero timestamp occurs, warn every time.
      This is an unexpected scenario, and with increasing timestamps, it's
      unlikely that it would occur more than once, therefore it should be
      ok to warn every time.
      
      Only try to calculate the timestamp by subtracting the instruction
      count if neither of the above cases are true. This makes attempting
      to decode files with zero timestamps in non-timeless mode
      more consistent. Currently it can half work if the timestamp wraps
      around and becomes non-zero, although the behavior is undefined and
      unpredictable.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210517131741.3027-4-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1a6165a
    • J
      perf cs-etm: Start reading 'Z' --itrace option · c36c1ef6
      James Clark 提交于
      Recently the 'Z' --itrace option was added to override detection
      of timeless decoding. This is also useful in Coresight to work around
      issues with invalid timestamps on some hardware.
      
      When the 'Z' option is provided, the existing timeless decoding mode
      will be used, even if timestamps were recorded.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210517131741.3027-3-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c36c1ef6
    • J
      perf cs-etm: Move synth_opts initialisation · cac31418
      James Clark 提交于
      Move initialisation of synth_opts earlier in the function
      so that synth_opts can be used at an earlier stage in a
      later commit.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210517131741.3027-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cac31418
    • J
      perf header: Support HYBRID_CPU_PMU_CAPS feature · e119083b
      Jin Yao 提交于
      Perf has supported the CPU_PMU_CAPS feature to display a list of CPU PMU
      capabilities. But on a hybrid platform, it may have several CPU PMUs (such
      as "cpu_core" and "cpu_atom"). The CPU_PMU_CAPS feature is hard to extend
      to support multiple CPU PMUs well if it needs to be compatible for the case
      of old perf data file + new perf tool.
      
      So for better compatibility we now create a new feature HYBRID_CPU_PMU_CAPS
      in the header.
      
      For the perf.data generated on hybrid platform,
      
        root@otcpl-adl-s-2:~# perf report --header-only -I
      
        # cpu_core pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
        # cpu_atom pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
        # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA
      
      For the perf.data generated on non-hybrid platform
      
        root@kbl-ppc:~# perf report --header-only -I
      
        # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
        # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA HYBRID_TOPOLOGY HYBRID_CPU_PMU_CAPS
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210514122948.9472-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e119083b
    • J
      perf header: Support HYBRID_TOPOLOGY feature · f7d74ce3
      Jin Yao 提交于
      It is useful to let the user know about the hybrid topology.
      
      Add the HYBRID_TOPOLOGY feature in header to indicate the core CPUs
      and the atom CPUs.
      
      With this patch a perf.data generated on a hybrid platform reports
      the hybrid CPU list:
      
        root@otcpl-adl-s-2:~# perf report --header-only -I
        ...
        # hybrid cpu system:
        # cpu_core cpu list : 0-15
        # cpu_atom cpu list : 16-23
      
      For a perf.data generated on a non-hybrid platform, reports a message
      that HYBRID_TOPOLOGY is missing:
      
        root@kbl-ppc:~# perf report --header-only -I
        ...
        # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA HYBRID_TOPOLOGY
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210514122948.9472-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f7d74ce3
  7. 13 5月, 2021 2 次提交
    • J
      perf cs-etm: Set time on synthesised samples to preserve ordering · 1ac9e0b5
      James Clark 提交于
      The following attribute is set when synthesising samples in
      timed decoding mode:
      
          attr.sample_type |= PERF_SAMPLE_TIME;
      
      This results in new samples that appear to have timestamps but
      because we don't assign any timestamps to the samples, when the
      resulting inject file is opened again, the synthesised samples
      will be on the wrong side of the MMAP or COMM events.
      
      For example, this results in the samples being associated with
      the perf binary, rather than the target of the record:
      
          perf record -e cs_etm/@tmc_etr0/u top
          perf inject -i perf.data -o perf.inject --itrace=i100il
          perf report -i perf.inject
      
      Where 'Command' == perf should show as 'top':
      
          # Overhead  Command  Source Shared Object  Source Symbol           Target Symbol           Basic Block Cycles
          # ........  .......  ....................  ......................  ......................  ..................
          #
              31.08%  perf     [unknown]             [.] 0x000000000040c3f8  [.] 0x000000000040c3e8  -
      
      If the perf.data file is opened directly with perf, without the
      inject step, then this already works correctly because the
      events are synthesised after the COMM and MMAP events and
      no second sorting happens. Re-sorting only happens when opening
      the perf.inject file for the second time so timestamps are
      needed.
      
      Using the timestamp from the AUX record mirrors the current
      behaviour when opening directly with perf, because the events
      are generated on the call to cs_etm__process_queues().
      
      The ETM trace could optionally contain time stamps, but there is
      no way to correlate this with the kernel time. So, the best available
      time value is that of the AUX_RECORD header. This patch uses
      the timestamp from the header for all the samples. The ordering of the
      samples are implicit in the trace and thus is fine with respect to
      relative ordering.
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Co-developed-by: NAl Grant <al.grant@arm.com>
      Signed-off-by: NAl Grant <al.grant@arm.com>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Acked-by: NSuzuki K Poulos <suzuki.poulose@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: coresight@lists.linaro.org
      Link: https://lore.kernel.org/r/20210510143248.27423-3-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1ac9e0b5
    • J
      perf cs-etm: Refactor timestamp variable names · aadd6ba4
      James Clark 提交于
      Remove ambiguity in variable names relating to timestamps.
      
      A later commit will save the sample kernel timestamp in one of the etm
      structs, so name all elements appropriately to avoid confusion.
      
      This is also removes some ambiguity arising from the fact that the
      --timestamp argument to perf record refers to sample kernel timestamps,
      and the /timestamp/ event modifier refers to CS timestamps, so the term
      is overloaded.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: coresight@lists.linaro.org
      Link: https://lore.kernel.org/r/20210510143248.27423-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aadd6ba4
  8. 12 5月, 2021 13 次提交
  9. 10 5月, 2021 3 次提交
  10. 29 4月, 2021 8 次提交
    • L
      perf session: Dump PERF_RECORD_TIME_CONV event · 81e70d7e
      Leo Yan 提交于
      Now perf tool uses the common stub function process_event_op2_stub() for
      dumping TIME_CONV event, thus it doesn't output the clock parameters
      contained in the event.
      
      This patch adds the callback function for dumping the hardware clock
      parameters in TIME_CONV event.
      
      Before:
      
        # perf report -D
      
        0x978 [0x38]: event: 79
        .
        . ... raw event: size 56 bytes
        .  0000:  4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00  O.....8.........
        .  0010:  00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff  ..@........<BF><DF><FF><FF><FF>
        .  0020:  d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00  <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>.
        .  0030:  01 01 00 00 00 00 00 00                          ........
      
        0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV
        : unhandled!
      
        [...]
      
      After:
      
        # perf report -D
      
        0x978 [0x38]: event: 79
        .
        . ... raw event: size 56 bytes
        .  0000:  4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00  O.....8.........
        .  0010:  00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff  ..@........<BF><DF><FF><FF><FF>
        .  0020:  d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00  <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>.
        .  0030:  01 01 00 00 00 00 00 00                          ........
      
        0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV
        ... Time Shift      21
        ... Time Muliplier  20971520
        ... Time Zero       18446743935180835206
        ... Time Cycles     13852918225
        ... Time Mask       0xffffffffffffff
        ... Cap Time Zero   1
        ... Cap Time Short  1
        : unhandled!
      
        [...]
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steve MacLean <Steve.MacLean@Microsoft.com>
      Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
      Link: https://lore.kernel.org/r/20210428120915.7123-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      81e70d7e
    • L
      perf session: Add swap operation for event TIME_CONV · 050ffc44
      Leo Yan 提交于
      Since commit d110162c ("perf tsc: Support cap_user_time_short for
      event TIME_CONV"), the event PERF_RECORD_TIME_CONV has extended the data
      structure for clock parameters.
      
      To be backwards-compatible, this patch adds a dedicated swap operation
      for the event PERF_RECORD_TIME_CONV, based on checking if the event
      contains field "time_cycles", it can support both for the old and new
      event formats.
      
      Fixes: d110162c ("perf tsc: Support cap_user_time_short for event TIME_CONV")
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steve MacLean <Steve.MacLean@Microsoft.com>
      Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
      Link: https://lore.kernel.org/r/20210428120915.7123-4-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      050ffc44
    • L
      perf jit: Let convert_timestamp() to be backwards-compatible · aa616f5a
      Leo Yan 提交于
      Commit d110162c ("perf tsc: Support cap_user_time_short for
      event TIME_CONV") supports the extended parameters for event TIME_CONV,
      but it broke the backwards compatibility, so any perf data file with old
      event format fails to convert timestamp.
      
      This patch introduces a helper event_contains() to check if an event
      contains a specific member or not.  For the backwards-compatibility, if
      the event size confirms the extended parameters are supported in the
      event TIME_CONV, then copies these parameters.
      
      Committer notes:
      
      To make this compiler backwards compatible add this patch:
      
        -       struct perf_tsc_conversion tc = { 0 };
        +       struct perf_tsc_conversion tc = { .time_shift = 0, };
      
      Fixes: d110162c ("perf tsc: Support cap_user_time_short for event TIME_CONV")
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steve MacLean <Steve.MacLean@Microsoft.com>
      Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
      Link: https://lore.kernel.org/r/20210428120915.7123-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa616f5a
    • J
      perf stat: Warn group events from different hybrid PMU · 660e533e
      Jin Yao 提交于
      If a group has events which are from different hybrid PMUs,
      shows a warning:
      
      "WARNING: events in group from different hybrid PMUs!"
      
      This is to remind the user not to put the core event and atom
      event into one group.
      
      Next, just disable grouping.
      
        # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1
        WARNING: events in group from different hybrid PMUs!
        WARNING: grouped events cpus do not match, disabling group:
          anon group { cpu_core/cycles/, cpu_atom/cycles/ }
      
         Performance counter stats for 'system wide':
      
                 5,438,125      cpu_core/cycles/
                 3,914,586      cpu_atom/cycles/
      
               1.004250966 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-17-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      660e533e
    • J
      perf stat: Filter out unmatched aggregation for hybrid event · 92637cc7
      Jin Yao 提交于
      perf-stat has supported some aggregation modes, such as --per-core,
      --per-socket and etc. While for hybrid event, it may only available
      on part of cpus. So for --per-core, we need to filter out the
      unavailable cores, for --per-socket, filter out the unavailable
      sockets, and so on.
      
      Before:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            479,530      cpu_core/cycles/
        S0-D0-C4           2            175,007      cpu_core/cycles/
        S0-D0-C8           2            166,240      cpu_core/cycles/
        S0-D0-C12          2            704,673      cpu_core/cycles/
        S0-D0-C16          2            865,835      cpu_core/cycles/
        S0-D0-C20          2          2,958,461      cpu_core/cycles/
        S0-D0-C24          2            163,988      cpu_core/cycles/
        S0-D0-C28          2            164,729      cpu_core/cycles/
        S0-D0-C32          0      <not counted>      cpu_core/cycles/
        S0-D0-C33          0      <not counted>      cpu_core/cycles/
        S0-D0-C34          0      <not counted>      cpu_core/cycles/
        S0-D0-C35          0      <not counted>      cpu_core/cycles/
        S0-D0-C36          0      <not counted>      cpu_core/cycles/
        S0-D0-C37          0      <not counted>      cpu_core/cycles/
        S0-D0-C38          0      <not counted>      cpu_core/cycles/
        S0-D0-C39          0      <not counted>      cpu_core/cycles/
      
               1.003597211 seconds time elapsed
      
      After:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            210,428      cpu_core/cycles/
        S0-D0-C4           2            444,830      cpu_core/cycles/
        S0-D0-C8           2            435,241      cpu_core/cycles/
        S0-D0-C12          2            423,976      cpu_core/cycles/
        S0-D0-C16          2            859,350      cpu_core/cycles/
        S0-D0-C20          2          1,559,589      cpu_core/cycles/
        S0-D0-C24          2            163,924      cpu_core/cycles/
        S0-D0-C28          2            376,610      cpu_core/cycles/
      
               1.003621290 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Co-developed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-16-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92637cc7
    • J
      perf record: Create two hybrid 'cycles' events by default · b53a0755
      Jin Yao 提交于
      When evlist is empty, for example no '-e' specified in perf record,
      one default 'cycles' event is added to evlist.
      
      While on hybrid platform, it needs to create two default 'cycles'
      events. One is for cpu_core, the other is for cpu_atom.
      
      This patch actually calls evsel__new_cycles() two times to create
      two 'cycles' events.
      
        # ./perf record -vv -a -- sleep 1
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 6
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 7
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 9
        sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 10
        sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 11
        sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 12
        sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 13
        sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 14
        sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 15
        sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 16
        sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 17
        sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 18
        sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 19
        sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 20
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 21
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 22
        sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 23
        sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 24
        sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 25
        sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 26
        sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 27
        sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 28
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 29
        ------------------------------------------------------------
      
      We have to create evlist-hybrid.c otherwise due to the symbol
      dependency the perf test python would be failed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-14-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b53a0755
    • J
      perf parse-events: Support event inside hybrid pmu · 5e4edd1f
      Jin Yao 提交于
      On hybrid platform, user may want to enable events on one pmu.
      
      Following syntax are supported:
      
      cpu_core/<event>/
      cpu_atom/<event>/
      
      But the syntax doesn't work for cache event.
      
      Before:
      
        # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
        event syntax error: 'cpu_core/LLC-loads/'
                                      \___ unknown term 'LLC-loads' for pmu 'cpu_core'
      
      Cache events are a bit complex. We can't create aliases for them.
      We use another solution. For example, if we use "cpu_core/LLC-loads/",
      in parse_events_add_pmu(), term->config is "LLC-loads".
      
      Then we create a new parser to scan "LLC-loads". The
      parse_events_add_cache() would be called during parsing.
      The parse_state->hybrid_pmu_name is used to identify the pmu
      where the event should be enabled on.
      
      After:
      
        # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
                    24,593      cpu_core/LLC-loads/
      
               1.003911601 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-13-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5e4edd1f
    • J
      perf parse-events: Compare with hybrid pmu name · c93afadc
      Jin Yao 提交于
      On hybrid platform, user may want to enable event only on one pmu.
      Following syntax will be supported:
      
      cpu_core/<event>/
      cpu_atom/<event>/
      
      For hardware event, hardware cache event and raw event, two events
      are created by default. We pass the specified pmu name in parse_state
      and it would be checked before event creation. So next only the
      event with the specified pmu would be created.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-12-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c93afadc