1. 03 9月, 2021 1 次提交
    • J
      perf cs-etm: Save TRCDEVARCH register · 51ba8811
      James Clark 提交于
      When ETE is present save the TRCDEVARCH register and set a new magic
      number. It will be used to configure the decoder in a later commit.
      
      Old versions of perf will not be able to open files with this new magic
      number, but old files will still work with newer versions of perf.
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Acked-by: NSuzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210806134109.1182235-5-james.clark@arm.com
      [ Addressed some cosmetic suggestions by Suzuki Poulouse ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      51ba8811
  2. 10 8月, 2021 1 次提交
    • J
      perf cs-etm: Add warnings for missing DSOs · 9c38b671
      James Clark 提交于
      Currently decode will silently fail if no binary data is available for
      the decode. This is made worse if only partial data is available because
      the decode will appear to work, but any trace from that missing DSO will
      silently not be generated.
      
      Add a UI popup once if there is any data missing, and then warn in the
      bottom left for each individual DSO that's missing.
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http //lore.kernel.org/lkml/20210805130354.878120-2-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9c38b671
  3. 02 8月, 2021 5 次提交
    • J
      perf cs-etm: Pass unformatted flag to decoder · 9182f04a
      James Clark 提交于
      The TRBE (Trace Buffer Extension) feature allows a separate trace buffer
      for each trace source, therefore the trace wouldn't need to be
      formatted. The driver was introduced in commit 3fbf7f01
      ("coresight: sink: Add TRBE driver").
      
      The formatted/unformatted mode is encoded in one of the flags of the
      AUX record. The first AUX record encountered for each event is used to
      determine the mode, and this will persist for the remaining trace that
      is either decoded or dumped.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-7-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9182f04a
    • J
      perf cs-etm: Use existing decoder instead of resetting it · 04aaad26
      James Clark 提交于
      When dumping trace, the decoder is continually deleted and recreated to
      decode each buffer. To support both formatted and unformatted trace in
      a later commit, the decoder will be configured in advance.
      
      This commit removes the deletion of the decoder and allows the
      formatted/unformatted setting to persist.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-6-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      04aaad26
    • J
      perf cs-etm: Only setup queues when they are modified · ca50db59
      James Clark 提交于
      Continually creating queues in cs_etm__process_event() is unnecessary.
      They only need to be created when a buffer for a new CPU or thread is
      encountered. This can be in two places, when building the queues in
      advance in cs_etm__process_auxtrace_info(), or in
      cs_etm__process_auxtrace_event() when data_queued is false and the
      index wasn't available (pipe mode).
      
      This change will allow the 'formatted' decoder setting to applied when
      iterating over aux records in a later commit.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-4-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ca50db59
    • J
      perf cs-etm: Split setup and timestamp search functions · 9ac8afd5
      James Clark 提交于
      This refactoring has some benefits:
      
       * Decoding is done to find the timestamp. If we want to print errors
         when maps aren't available, then doing it from cs_etm__setup_queue()
         may cause warnings to be printed.
      
       * The cs_etm__setup_queue() flow is shared between timed and timeless
         modes, so it needs to be guarded by an if statement which can now
         be removed.
      
       * Allows moving the setup queues function earlier.
      
       * If data was piped in, then not all queues would be filled so it
         wouldn't have worked properly anyway. Now it waits for flush so
         data in all queues will be available.
      
      The motivation for this is to decouple setup functions with ones that
      involve decoding. That way we can move the setup function earlier when
      the formatted/unformatted trace information is available.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-3-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ac8afd5
    • J
      perf cs-etm: Refactor initialisation of kernel start address · 6f38e115
      James Clark 提交于
      The kernel start address is already cached in the machine struct once it
      is initialised, so storing it in the cs_etm struct is unnecessary.
      
      It also depends on kernel maps being available to be initialised.
      Therefore cs_etm__setup_queues() isn't an appropriate place to call it
      because it could be called before processing starts. It would be better
      to initialise it at the point when it is needed, then we can be sure
      that all the necessary maps are available. Also by calling
      machine__kernel_start() multiple times it can be initialised at some
      point, even if it failed to initialise previously due to missing maps.
      
      In a later commit cs_etm__setup_queues() will be moved which is the
      motivation for this change.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: coresight@lists.linaro.org
      Link: https://lore.kernel.org/r/20210721150202.32065-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f38e115
  4. 27 7月, 2021 1 次提交
    • J
      perf cs-etm: Split --dump-raw-trace by AUX records · 48e8a7b5
      James Clark 提交于
      Currently --dump-raw-trace skips queueing and splitting buffers because
      of an early exit condition in cs_etm__process_auxtrace_info(). Once
      that is removed we can print the split data by using the queues
      and searching for split buffers with the same reference as the
      one that is currently being processed.
      
      This keeps the same behaviour of dumping in file order when an AUXTRACE
      event appears, rather than moving trace dump to where AUX records are in
      the file.
      
      There will be a newline and size printout for each fragment. For example
      this buffer is comprised of two AUX records, but was printed as one:
      
        0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
      
        . ... CoreSight ETM Trace data: size 160 bytes
                Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
                Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
                Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
                Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
      
      But is now printed as two fragments:
      
        0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
      
        . ... CoreSight ETM Trace data: size 80 bytes
                Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
                Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
      
        . ... CoreSight ETM Trace data: size 80 bytes
                Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
                Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
      
      Decoding errors that appeared in problematic files are now not present,
      for example:
      
              Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
              ...
              PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210624164303.28632-3-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      48e8a7b5
  5. 15 7月, 2021 1 次提交
    • J
      perf cs-etm: Split Coresight decode by aux records · 83d1fc92
      James Clark 提交于
      Populate the auxtrace queues using AUX records rather than whole
      auxtrace buffers so that the decoder is reset between each aux record.
      
      This is similar to the auxtrace_queues__process_index() ->
      auxtrace_queues__add_indexed_event() flow where
      perf_session__peek_event() is used to read AUXTRACE events out of random
      positions in the file based on the auxtrace index.
      
      But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE
      buffers. For each PERF_RECORD_AUX event, we find the corresponding
      AUXTRACE buffer using the index, and add a fragment of that buffer to
      the auxtrace queues.
      
      No other changes to decoding were made, apart from populating the
      auxtrace queues. The result of decoding is identical to before, except
      in cases where decoding failed completely, due to not resetting the
      decoder.
      
      The reason for this change is because AUX records are emitted any time
      tracing is disabled, for example when the process is scheduled out.
      Because ETM was disabled and enabled again, the decoder also needs to be
      reset to force the search for a sync packet. Otherwise there would be
      fatal decoding errors.
      
      Testing
      =======
      
      Testing was done with the following script, to diff the decoding results
      between the patched and un-patched versions of perf:
      
      	#!/bin/bash
      	set -ex
      
      	$1 script -i $3 $4 > split.script
      	$2 script -i $3 $4 > default.script
      
      	diff split.script default.script | head -n 20
      
      And it was run like this, with various itrace options depending on the
      quantity of synthesised events:
      
      	compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns
      
      No changes in output were observed in the following scenarios:
      
      * Simple per-cpu
      	perf record -e cs_etm/@tmc_etr0/u top
      
      * Per-thread, single thread
      	perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C
      
      * Per-thread multiple threads (but only one thread collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-thread multiple threads (both threads collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-cpu explicit threads:
      	perf record -e cs_etm/@tmc_etr0/u --pid 853,854
      
      * System-wide (per-cpu):
          perf record -e cs_etm/@tmc_etr0/u -a
      
      * No data collected (no aux buffers)
      	Can happen with any command when run for a short period
      
      * Containing truncated records
      	Can happen with any command
      
      * Containing aux records with 0 size
      	Can happen with any command
      
      * Snapshot mode (various files with and without buffer wrap)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Some differences were observed in the following scenario:
      
      * Snapshot mode (with duplicate buffers)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Fewer samples are generated in snapshot mode if duplicate buffers
      were gathered because buffers with the same offset are now only added
      once. This gives different, but more correct results and no duplicate
      data is decoded any more.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210624164303.28632-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83d1fc92
  6. 02 7月, 2021 1 次提交
    • J
      perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events() · 0323dea3
      James Clark 提交于
      Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
      non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
      
      This can cause the "data has no samples!" error if the first
      PERF_RECORD_AUX record comes before the first (or any relevant)
      PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
      to access the binary data.
      
      This change pushes the start of non-timeless decoding to the very end of
      parsing the file. The PERF_RECORD_EXIT event can't be used because it
      might not exist in system-wide or snapshot modes.
      
      I have not been able to find the exact cause for the events to be
      intermittently in the wrong order in the basic scenario:
      
      	perf record -e cs_etm/@tmc_etr0/u top
      
      But it can be made to happen every time with the --delay option. This is
      because "enable_on_exec" is disabled, which causes tracing to start
      before the process to be launched is exec'd. For example:
      
      	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
      	perf report -D | grep 'AUX\|MAP'
      
      	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
      	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
      	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
      	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
      	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
      
      Another scenario in which decoding from the first aux record fails is a
      workload that forks. Although the aux record comes after 'bash', it
      comes before 'top', which is what we are interested in. For example:
      
      	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
      	perf report -D | grep 'AUX\|MAP'
      
      	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
      	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
      	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
      	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
      	...
      	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
      	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
      	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
      	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
      
      A third scenario is when the majority of time is spent in a shared
      library that is not loaded at startup. For example a dynamically loaded
      plugin.
      
      Testing
      =======
      
      Testing was done by checking if any samples that are present in the
      old output are missing from the new output. Timestamps must be
      stripped out with awk because now they are set to the last AUX sample,
      rather than the first:
      
      	./perf script $4 | awk '!($4="")' > new.script
      	./perf-default script $4 | awk '!($4="")' > default.script
      	comm -13 <(sort -u new.script) <(sort -u default.script)
      
      Testing showed that the new output is a superset of the old. When lines
      appear in the comm output, it is not because they are missing but
      because [unknown] is now resolved to sensible locations. For example
      last putp branch here now resolves to libtinfo, so it's not missing
      from the output, but is actually improved:
      
      Old:
      	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
      	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
      	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
      New:
      	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
      	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
      	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
      
      In the following two modes, decoding now works and the "data has no
      samples!" error is not displayed any more:
      
      	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
      	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
      
      In snapshot mode, there is also an improvement to decoding. Previously
      samples for the 'kill' process that was used to send SIGUSR2 were
      completely missing, because the process hadn't started yet. But now
      there are additional samples present:
      
      	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
      	perf script
      
      		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
      		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
      		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
      
      Also tested was the round trip of 'perf inject' followed by 'perf
      report' which has the same differences and improvements.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210609130421.13934-1-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0323dea3
  7. 17 5月, 2021 2 次提交
    • J
      perf cs-etm: Start reading 'Z' --itrace option · c36c1ef6
      James Clark 提交于
      Recently the 'Z' --itrace option was added to override detection
      of timeless decoding. This is also useful in Coresight to work around
      issues with invalid timestamps on some hardware.
      
      When the 'Z' option is provided, the existing timeless decoding mode
      will be used, even if timestamps were recorded.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210517131741.3027-3-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c36c1ef6
    • J
      perf cs-etm: Move synth_opts initialisation · cac31418
      James Clark 提交于
      Move initialisation of synth_opts earlier in the function
      so that synth_opts can be used at an earlier stage in a
      later commit.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210517131741.3027-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cac31418
  8. 13 5月, 2021 2 次提交
    • J
      perf cs-etm: Set time on synthesised samples to preserve ordering · 1ac9e0b5
      James Clark 提交于
      The following attribute is set when synthesising samples in
      timed decoding mode:
      
          attr.sample_type |= PERF_SAMPLE_TIME;
      
      This results in new samples that appear to have timestamps but
      because we don't assign any timestamps to the samples, when the
      resulting inject file is opened again, the synthesised samples
      will be on the wrong side of the MMAP or COMM events.
      
      For example, this results in the samples being associated with
      the perf binary, rather than the target of the record:
      
          perf record -e cs_etm/@tmc_etr0/u top
          perf inject -i perf.data -o perf.inject --itrace=i100il
          perf report -i perf.inject
      
      Where 'Command' == perf should show as 'top':
      
          # Overhead  Command  Source Shared Object  Source Symbol           Target Symbol           Basic Block Cycles
          # ........  .......  ....................  ......................  ......................  ..................
          #
              31.08%  perf     [unknown]             [.] 0x000000000040c3f8  [.] 0x000000000040c3e8  -
      
      If the perf.data file is opened directly with perf, without the
      inject step, then this already works correctly because the
      events are synthesised after the COMM and MMAP events and
      no second sorting happens. Re-sorting only happens when opening
      the perf.inject file for the second time so timestamps are
      needed.
      
      Using the timestamp from the AUX record mirrors the current
      behaviour when opening directly with perf, because the events
      are generated on the call to cs_etm__process_queues().
      
      The ETM trace could optionally contain time stamps, but there is
      no way to correlate this with the kernel time. So, the best available
      time value is that of the AUX_RECORD header. This patch uses
      the timestamp from the header for all the samples. The ordering of the
      samples are implicit in the trace and thus is fine with respect to
      relative ordering.
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Co-developed-by: NAl Grant <al.grant@arm.com>
      Signed-off-by: NAl Grant <al.grant@arm.com>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Acked-by: NSuzuki K Poulos <suzuki.poulose@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: coresight@lists.linaro.org
      Link: https://lore.kernel.org/r/20210510143248.27423-3-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1ac9e0b5
    • J
      perf cs-etm: Refactor timestamp variable names · aadd6ba4
      James Clark 提交于
      Remove ambiguity in variable names relating to timestamps.
      
      A later commit will save the sample kernel timestamp in one of the etm
      structs, so name all elements appropriately to avoid confusion.
      
      This is also removes some ambiguity arising from the fact that the
      --timestamp argument to perf record refers to sample kernel timestamps,
      and the /timestamp/ event modifier refers to CS timestamps, so the term
      is overloaded.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: coresight@lists.linaro.org
      Link: https://lore.kernel.org/r/20210510143248.27423-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aadd6ba4
  9. 24 3月, 2021 1 次提交
  10. 02 3月, 2021 2 次提交
  11. 01 9月, 2020 1 次提交
    • A
      perf cs-etm: Fix corrupt data after perf inject from · f5f8e7e5
      Al Grant 提交于
      Commit 42bbabed ("perf tools: Add hw_idx in struct branch_stack")
      changed the format of branch stacks in perf samples. When samples use
      this new format, a flag must be set in the corresponding event.
      
      Synthesized branch stacks generated from CoreSight ETM trace were using
      the new format, but not setting the event attribute, leading to
      consumers seeing corrupt data. This patch fixes the issue by setting the
      event attribute to indicate use of the new format.
      
      Fixes: 42bbabed ("perf tools: Add hw_idx in struct branch_stack")
      Signed-off-by: NAl Grant <al.grant@arm.com>
      Reviewed-by: NAndrea Brunato <andrea.brunato@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Link: http://lore.kernel.org/lkml/20200819084751.17686-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f5f8e7e5
  12. 06 5月, 2020 1 次提交
  13. 16 4月, 2020 1 次提交
  14. 11 3月, 2020 5 次提交
    • L
      perf cs-etm: Fix unsigned variable comparison to zero · bc010dd6
      Leo Yan 提交于
      The variable 'offset' in function cs_etm__sample() is u64 type, it's not
      appropriate to check it with 'while (offset > 0)'; this patch changes to
      'while (offset)'.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-6-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bc010dd6
    • L
      perf cs-etm: Optimize copying last branches · 695378b5
      Leo Yan 提交于
      If an instruction range packet can generate multiple instruction
      samples, these samples share the same last branches; it's not necessary
      to copy the same last branches repeatedly for these samples within the
      same packet.
      
      This patch moves out the last branches copying from function
      cs_etm__synth_instruction_sample(), and execute it prior to generating
      instruction samples.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      695378b5
    • L
      perf cs-etm: Correct synthesizing instruction samples · c9f5baa1
      Leo Yan 提交于
      When 'etm->instructions_sample_period' is less than
      'tidq->period_instructions', the function cs_etm__sample() cannot handle
      this case properly with its logic.
      
      Let's see below flow as an example:
      
      - If we set itrace option '--itrace=i4', then function cs_etm__sample()
        has variables with initialized values:
      
        tidq->period_instructions = 0
        etm->instructions_sample_period = 4
      
      - When the first packet is coming:
      
        packet->instr_count = 10; the number of instructions executed in this
        packet is 10, thus update period_instructions as below:
      
        tidq->period_instructions = 0 + 10 = 10
        instrs_over = 10 - 4 = 6
        offset = 10 - 6 - 1 = 3
        tidq->period_instructions = instrs_over = 6
      
      - When the second packet is coming:
      
        packet->instr_count = 10; in the second pass, assume 10 instructions
        in the trace sample again:
      
        tidq->period_instructions = 6 + 10 = 16
        instrs_over = 16 - 4 = 12
        offset = 10 - 12 - 1 = -3  -> the negative value
        tidq->period_instructions = instrs_over = 12
      
      So after handle these two packets, there have below issues:
      
      The first issue is that cs_etm__instr_addr() returns the address within
      the current trace sample of the instruction related to offset, so the
      offset is supposed to be always unsigned value.  But in fact, function
      cs_etm__sample() might calculate a negative offset value (in handling
      the second packet, the offset is -3) and pass to cs_etm__instr_addr()
      with u64 type with a big positive integer.
      
      The second issue is it only synthesizes 2 samples for sample period = 4.
      In theory, every packet has 10 instructions so the two packets have
      total 20 instructions, 20 instructions should generate 5 samples
      (4 x 5 = 20).  This is because cs_etm__sample() only calls once
      cs_etm__synth_instruction_sample() to generate instruction sample per
      range packet.
      
      This patch fixes the logic in function cs_etm__sample(); the basic
      idea for handling coming packet is:
      
      - To synthesize the first instruction sample, it combines the left
        instructions from the previous packet and the head of the new
        packet; then generate continuous samples with sample period;
      - At the tail of the new packet, if it has the rest instructions,
        these instructions will be left for the sequential sample.
      Suggested-by: NMike Leach <mike.leach@linaro.org>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c9f5baa1
    • L
      perf cs-etm: Continuously record last branch · f1410028
      Leo Yan 提交于
      Every time synthesize instruction sample, the last branch recording will
      be reset.  This is fine if the instruction period is big enough, for
      example if use the option '--itrace=i100000', the last branch array is
      reset for every sample with 100000 instructions per period; before
      generate the next instruction sample, there has the sufficient packets
      coming to fill the last branch array.
      
      On the other hand, if set a very small period, the packets will be
      significantly reduced between two continuous instruction samples, thus
      the last branch array is almost empty for new instruction sample by
      frequently resetting.
      
      To allow the last branches to work properly for any instruction periods,
      this patch avoids to reset the last branch for every instruction sample
      and only reset it when flush the trace data.  The last branches will be
      reset only for two cases, one is for trace starting, another case is for
      discontinuous trace; other cases can keep recording last branches for
      continuous instruction samples.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f1410028
    • L
      perf cs-etm: Swap packets for instruction samples · d0175156
      Leo Yan 提交于
      If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
      fails inject instruction samples; the root cause is the packets are only
      swapped for branch samples and last branches but not for instruction
      samples, so the new coming packets cannot be properly handled for only
      synthesizing instruction samples.
      
      To fix this issue, this patch refactors the code with a new function
      cs_etm__packet_swap() which is used to swap packets and adds the
      condition for instruction samples.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-2-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0175156
  15. 10 3月, 2020 1 次提交
    • K
      perf tools: Add hw_idx in struct branch_stack · 42bbabed
      Kan Liang 提交于
      The low level index of raw branch records for the most recent branch can
      be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
      branch_sample_type. Extend struct branch_stack to support it.
      
      However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
      entries[] will be output by kernel. The pointer of entries[] could be
      wrong, since the output format is different with new struct
      branch_stack.  Add a variable no_hw_idx in struct perf_sample to
      indicate whether the hw_idx is output.  Add get_branch_entry() to return
      corresponding pointer of entries[0].
      
      To make dummy branch sample consistent as new branch sample, add hw_idx
      in struct dummy_branch_stack for cs-etm and intel-pt.
      
      Apply the new struct branch_stack for synthetic events as well.
      
      Extend test case sample-parsing to support new struct branch_stack.
      
      Committer notes:
      
      Renamed get_branch_entries() to perf_sample__branch_entries() to have
      proper namespacing and pave the way for this to be moved to libperf,
      eventually.
      
      Add 'static' to that inline as it is in a header.
      
      Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
      on arm64.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42bbabed
  16. 26 11月, 2019 1 次提交
    • A
      perf maps: Merge 'struct maps' with 'struct map_groups' · 79b6bb73
      Arnaldo Carvalho de Melo 提交于
      And pick the shortest name: 'struct maps'.
      
      The split existed because we used to have two groups of maps, one for
      functions and one for variables, but that only complicated things,
      sometimes we needed to figure out what was at some address and then had
      to first try it on the functions group and if that failed, fall back to
      the variables one.
      
      That split is long gone, so for quite a while we had only one struct
      maps per struct map_groups, simplify things by combining those structs.
      
      First patch is the minimum needed to merge both, follow up patches will
      rename 'thread->mg' to 'thread->maps', etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      79b6bb73
  17. 07 11月, 2019 1 次提交
    • L
      perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR · 9d604aad
      Leo Yan 提交于
      Macro TO_CS_QUEUE_NR definition has a typo, which uses 'trace_id_chan'
      as its parameter, this doesn't match with its definition body which uses
      'trace_chan_id'.  So renames the parameter to 'trace_chan_id'.
      
      It's luck to have a local variable 'trace_chan_id' in the function
      cs_etm__setup_queue(), even we wrongly define the macro TO_CS_QUEUE_NR,
      the local variable 'trace_chan_id' is used rather than the macro's
      parameter 'trace_id_chan'; so the compiler doesn't complain for this
      before.
      
      After renaming the parameter, it leads to a compiling error due
      cs_etm__setup_queue() has no variable 'trace_id_chan'.  This patch uses
      the variable 'trace_chan_id' for the macro so that fixes the compiling
      error.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20191021074808.25795-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9d604aad
  18. 25 9月, 2019 1 次提交
  19. 20 9月, 2019 2 次提交
  20. 01 9月, 2019 4 次提交
  21. 29 8月, 2019 2 次提交
  22. 20 8月, 2019 1 次提交
    • L
      perf cs-etm: Support sample flags 'insn' and 'insnlen' · a4973d8f
      Leo Yan 提交于
      The synthetic branch and instruction samples are missed to set
      instruction related info, thus the perf tool fails to display samples
      with flags '-F,+insn,+insnlen'.
      
      The CoreSight trace decoder provides sufficient information to decide
      the instruction size based on the ISA type: A64/A32 instructions are
      32-bit size, but one exception is the T32 instruction size, which might
      be 32-bit or 16-bit.
      
      This patch handles these cases and it reads the instruction values from
      DSO file; thus can support the flags '-F,+insn,+insnlen'.
      
      Before:
      
        # perf script -F,insn,insnlen,ip,sym
                      0 [unknown] ilen: 0
           ffff97174044 _start ilen: 0
           ffff97174938 _dl_start ilen: 0
           ffff97174938 _dl_start ilen: 0
           ffff97174938 _dl_start ilen: 0
           ffff97174938 _dl_start ilen: 0
           ffff97174938 _dl_start ilen: 0
           ffff97174938 _dl_start ilen: 0
           ffff97174938 _dl_start ilen: 0
           ffff97174938 _dl_start ilen: 0
      
        [...]
      
      After:
      
        # perf script -F,insn,insnlen,ip,sym
                      0 [unknown] ilen: 0
           ffff97174044 _start ilen: 4 insn: 2f 02 00 94
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
           ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
      
        [...]
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Tested-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20190815082854.18191-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a4973d8f
  23. 30 7月, 2019 2 次提交
    • J
      libperf: Move perf_event_attr field from perf's evsel to libperf's perf_evsel · 1fc632ce
      Jiri Olsa 提交于
      Move the perf_event_attr struct fron 'struct evsel' to 'struct perf_evsel'.
      
      Committer notes:
      
      Fixed up these:
      
       tools/perf/arch/arm/util/auxtrace.c
       tools/perf/arch/arm/util/cs-etm.c
       tools/perf/arch/arm64/util/arm-spe.c
       tools/perf/arch/s390/util/auxtrace.c
       tools/perf/util/cs-etm.c
      
      Also
      
        cc1: warnings being treated as errors
        tests/sample-parsing.c: In function 'do_test':
        tests/sample-parsing.c:162: error: missing initializer
        tests/sample-parsing.c:162: error: (near initialization for 'evsel.core.cpus')
      
         	struct evsel evsel = {
         		.needs_swap = false,
        -		.core.attr = {
        -			.sample_type = sample_type,
        -			.read_format = read_format,
        +		.core = {
        +			. attr = {
        +				.sample_type = sample_type,
        +				.read_format = read_format,
        +			},
      
        [perfbuilder@a70e4eeb5549 /]$ gcc --version |& head -1
        gcc (GCC) 4.4.7
      
      Also we don't need to include perf_event.h in
      tools/perf/lib/include/perf/evsel.h, forward declaring 'struct
      perf_event_attr' is enough. And this even fixes the build in some
      systems where things are used somewhere down the include path from
      perf_event.h without defining __always_inline.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190721112506.12306-43-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fc632ce
    • J
      perf evlist: Rename struct perf_evlist to struct evlist · 63503dba
      Jiri Olsa 提交于
      Rename struct perf_evlist to struct evlist, so we don't have a name
      clash when we add struct perf_evlist in libperf.
      
      Committer notes:
      
      Added fixes to build on arm64, from Jiri and from me
      (tools/perf/util/cs-etm.c)
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63503dba