1. 02 8月, 2021 8 次提交
    • L
      perf annotate: Add error log in symbol__annotate() · c4db54be
      Li Huafei 提交于
      When users use 'perf annotate' on unsupported machines, error logs
      should be printed for user feedback.
      Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Dengcheng Zhu <dzhu@wavecomp.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Zhang Jinhao <zhangjinhao2@huawei.com>
      Link: http://lore.kernel.org/lkml/20210726123854.13463-2-lihuafei1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c4db54be
    • L
      perf env: Normalize aarch64.* and arm64.* to arm64 in normalize_arch() · 4502da0e
      Li Huafei 提交于
      On my aarch64 big endian machine, the perf annotate does not work.
      
       # perf annotate
        Percent |      Source code & Disassembly of [kernel.kallsyms] for cycles (253 samples, percent: local period)
       --------------------------------------------------------------------------------------------------------------
        Percent |      Source code & Disassembly of [kernel.kallsyms] for cycles (1 samples, percent: local period)
       ------------------------------------------------------------------------------------------------------------
        Percent |      Source code & Disassembly of [kernel.kallsyms] for cycles (47 samples, percent: local period)
       -------------------------------------------------------------------------------------------------------------
       ...
      
      This is because the arch_find() function uses the normalized architecture
      name provided by normalize_arch(), and my machine's architecture name
      aarch64_be is not normalized to arm64.  Like other architectures such as
      arm and powerpc, we can fuzzy match the architecture names associated with
      aarch64.* and normalize them.
      
      It seems that there is also arm64_be architecture name, which we also
      normalize to arm64.
      Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Dengcheng Zhu <dzhu@wavecomp.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Zhang Jinhao <zhangjinhao2@huawei.com>
      Link: http //lore.kernel.org/lkml/20210726123854.13463-1-lihuafei1@huawei.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4502da0e
    • J
      perf cs-etm: Pass unformatted flag to decoder · 9182f04a
      James Clark 提交于
      The TRBE (Trace Buffer Extension) feature allows a separate trace buffer
      for each trace source, therefore the trace wouldn't need to be
      formatted. The driver was introduced in commit 3fbf7f01
      ("coresight: sink: Add TRBE driver").
      
      The formatted/unformatted mode is encoded in one of the flags of the
      AUX record. The first AUX record encountered for each event is used to
      determine the mode, and this will persist for the remaining trace that
      is either decoded or dumped.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-7-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9182f04a
    • J
      perf cs-etm: Use existing decoder instead of resetting it · 04aaad26
      James Clark 提交于
      When dumping trace, the decoder is continually deleted and recreated to
      decode each buffer. To support both formatted and unformatted trace in
      a later commit, the decoder will be configured in advance.
      
      This commit removes the deletion of the decoder and allows the
      formatted/unformatted setting to persist.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-6-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      04aaad26
    • J
      perf cs-etm: Suppress printing when resetting decoder · b8324f49
      James Clark 提交于
      The decoder is quite noisy when being reset. In a future commit,
      dump-raw-trace will use a code path that resets the decoder rather than
      creating a new one, so printing has to be suppressed to not flood the
      output.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-5-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b8324f49
    • J
      perf cs-etm: Only setup queues when they are modified · ca50db59
      James Clark 提交于
      Continually creating queues in cs_etm__process_event() is unnecessary.
      They only need to be created when a buffer for a new CPU or thread is
      encountered. This can be in two places, when building the queues in
      advance in cs_etm__process_auxtrace_info(), or in
      cs_etm__process_auxtrace_event() when data_queued is false and the
      index wasn't available (pipe mode).
      
      This change will allow the 'formatted' decoder setting to applied when
      iterating over aux records in a later commit.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-4-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ca50db59
    • J
      perf cs-etm: Split setup and timestamp search functions · 9ac8afd5
      James Clark 提交于
      This refactoring has some benefits:
      
       * Decoding is done to find the timestamp. If we want to print errors
         when maps aren't available, then doing it from cs_etm__setup_queue()
         may cause warnings to be printed.
      
       * The cs_etm__setup_queue() flow is shared between timed and timeless
         modes, so it needs to be guarded by an if statement which can now
         be removed.
      
       * Allows moving the setup queues function earlier.
      
       * If data was piped in, then not all queues would be filled so it
         wouldn't have worked properly anyway. Now it waits for flush so
         data in all queues will be available.
      
      The motivation for this is to decouple setup functions with ones that
      involve decoding. That way we can move the setup function earlier when
      the formatted/unformatted trace information is available.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https //lore.kernel.org/r/20210721150202.32065-3-james.clark@arm.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ac8afd5
    • J
      perf cs-etm: Refactor initialisation of kernel start address · 6f38e115
      James Clark 提交于
      The kernel start address is already cached in the machine struct once it
      is initialised, so storing it in the cs_etm struct is unnecessary.
      
      It also depends on kernel maps being available to be initialised.
      Therefore cs_etm__setup_queues() isn't an appropriate place to call it
      because it could be called before processing starts. It would be better
      to initialise it at the point when it is needed, then we can be sure
      that all the necessary maps are available. Also by calling
      machine__kernel_start() multiple times it can be initialised at some
      point, even if it failed to initialise previously due to missing maps.
      
      In a later commit cs_etm__setup_queues() will be moved which is the
      motivation for this change.
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: coresight@lists.linaro.org
      Link: https://lore.kernel.org/r/20210721150202.32065-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6f38e115
  2. 31 7月, 2021 1 次提交
    • A
      Revert "perf map: Fix dso->nsinfo refcounting" · 9bac1bd6
      Arnaldo Carvalho de Melo 提交于
      This makes 'perf top' abort in some cases, and the right fix will
      involve surgery that is too much to do at this stage, so revert for now
      and fix it in the next merge window.
      
      This reverts commit 2d6b74ba.
      
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9bac1bd6
  3. 28 7月, 2021 1 次提交
    • J
      perf pmu: Fix alias matching · c07d5c92
      John Garry 提交于
      Commit c47a5599 ("perf tools: Fix pattern matching for same
      substring in different PMU type"), may have fixed some alias matching,
      but has broken some others.
      
      Firstly it cannot handle the simple scenario of PMU name in form
      pmu_name{digits} - it can only handle pmu_name_{digits}.
      
      Secondly it cannot handle more complex matching in the case where we
      have multiple tokens. In this scenario, the code failed to realise that
      we may examine multiple substrings in the PMU name.
      
      Fix in two ways:
      
      - Change perf_pmu__valid_suffix() to accept a PMU name without '_' in the
        suffix
      
      - Only pay attention to perf_pmu__valid_suffix() for the final token
      
      Also add const qualifiers as necessary to avoid casting.
      
      Fixes: c47a5599 ("perf tools: Fix pattern matching for same substring in different PMU type")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Tested-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/1626793819-79090-1-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c07d5c92
  4. 27 7月, 2021 1 次提交
    • J
      perf cs-etm: Split --dump-raw-trace by AUX records · 48e8a7b5
      James Clark 提交于
      Currently --dump-raw-trace skips queueing and splitting buffers because
      of an early exit condition in cs_etm__process_auxtrace_info(). Once
      that is removed we can print the split data by using the queues
      and searching for split buffers with the same reference as the
      one that is currently being processed.
      
      This keeps the same behaviour of dumping in file order when an AUXTRACE
      event appears, rather than moving trace dump to where AUX records are in
      the file.
      
      There will be a newline and size printout for each fragment. For example
      this buffer is comprised of two AUX records, but was printed as one:
      
        0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
      
        . ... CoreSight ETM Trace data: size 160 bytes
                Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
                Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
                Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
                Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
      
      But is now printed as two fragments:
      
        0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
      
        . ... CoreSight ETM Trace data: size 80 bytes
                Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
                Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
      
        . ... CoreSight ETM Trace data: size 80 bytes
                Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
                Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
                Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
      
      Decoding errors that appeared in problematic files are now not present,
      for example:
      
              Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
              ...
              PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210624164303.28632-3-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      48e8a7b5
  5. 18 7月, 2021 3 次提交
    • Y
      perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel · 22a66551
      Yang Jihong 提交于
      The "address" member of "struct probe_trace_point" uses long data type.
      If kernel is 64-bit and perf program is 32-bit, size of "address"
      variable is 32 bits.
      
      As a result, upper 32 bits of address read from kernel are truncated, an
      error occurs during address comparison in kprobe_warn_out_range().
      
      Before:
      
        # perf probe -a schedule
        schedule is out of .text, skip it.
          Error: Failed to add events.
      
      Solution:
        Change data type of "address" variable to u64 and change corresponding
      address printing and value assignment.
      
      After:
      
        # perf.new.new probe -a schedule
        Added new event:
          probe:schedule       (on schedule)
      
        You can now use it in all perf tools, such as:
      
                perf record -e probe:schedule -aR sleep 1
      
        # perf probe -l
          probe:schedule       (on schedule@kernel/sched/core.c)
        # perf record -e probe:schedule -aR sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ]
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'probe:schedule'
        # Event count (approx.): 1366
        #
        # Overhead  Command          Shared Object      Symbol
        # ........  ...............  .................  ............
        #
             6.22%  migration/0      [kernel.kallsyms]  [k] schedule
             6.22%  migration/1      [kernel.kallsyms]  [k] schedule
             6.22%  migration/2      [kernel.kallsyms]  [k] schedule
             6.22%  migration/3      [kernel.kallsyms]  [k] schedule
             6.15%  migration/10     [kernel.kallsyms]  [k] schedule
             6.15%  migration/11     [kernel.kallsyms]  [k] schedule
             6.15%  migration/12     [kernel.kallsyms]  [k] schedule
             6.15%  migration/13     [kernel.kallsyms]  [k] schedule
             6.15%  migration/14     [kernel.kallsyms]  [k] schedule
             6.15%  migration/15     [kernel.kallsyms]  [k] schedule
             6.15%  migration/4      [kernel.kallsyms]  [k] schedule
             6.15%  migration/5      [kernel.kallsyms]  [k] schedule
             6.15%  migration/6      [kernel.kallsyms]  [k] schedule
             6.15%  migration/7      [kernel.kallsyms]  [k] schedule
             6.15%  migration/8      [kernel.kallsyms]  [k] schedule
             6.15%  migration/9      [kernel.kallsyms]  [k] schedule
             0.22%  rcu_sched        [kernel.kallsyms]  [k] schedule
        ...
        #
        # (Cannot load tips.txt file, please install perf!)
        #
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jianlin Lv <jianlin.lv@arm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Li Huafei <lihuafei1@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      22a66551
    • R
      perf data: Close all files in close_dir() · d4b3eedc
      Riccardo Mancini 提交于
      When using 'perf report' in directory mode, the first file is not closed
      on exit, causing a memory leak.
      
      The problem is caused by the iterating variable never reaching 0.
      
      Fixes: 14552063 ("perf data: Add perf_data__(create_dir|close_dir) functions")
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d4b3eedc
    • R
      perf probe-file: Delete namelist in del_events() on the error path · e0fa7ab4
      Riccardo Mancini 提交于
      ASan reports some memory leaks when running:
      
        # perf test "42: BPF filter"
      
      This second leak is caused by a strlist not being dellocated on error
      inside probe_file__del_events.
      
      This patch adds a goto label before the deallocation and makes the error
      path jump to it.
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Fixes: e7895e42 ("perf probe: Split del_perf_probe_events()")
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343282.git.rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0fa7ab4
  6. 16 7月, 2021 8 次提交
  7. 15 7月, 2021 1 次提交
    • J
      perf cs-etm: Split Coresight decode by aux records · 83d1fc92
      James Clark 提交于
      Populate the auxtrace queues using AUX records rather than whole
      auxtrace buffers so that the decoder is reset between each aux record.
      
      This is similar to the auxtrace_queues__process_index() ->
      auxtrace_queues__add_indexed_event() flow where
      perf_session__peek_event() is used to read AUXTRACE events out of random
      positions in the file based on the auxtrace index.
      
      But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE
      buffers. For each PERF_RECORD_AUX event, we find the corresponding
      AUXTRACE buffer using the index, and add a fragment of that buffer to
      the auxtrace queues.
      
      No other changes to decoding were made, apart from populating the
      auxtrace queues. The result of decoding is identical to before, except
      in cases where decoding failed completely, due to not resetting the
      decoder.
      
      The reason for this change is because AUX records are emitted any time
      tracing is disabled, for example when the process is scheduled out.
      Because ETM was disabled and enabled again, the decoder also needs to be
      reset to force the search for a sync packet. Otherwise there would be
      fatal decoding errors.
      
      Testing
      =======
      
      Testing was done with the following script, to diff the decoding results
      between the patched and un-patched versions of perf:
      
      	#!/bin/bash
      	set -ex
      
      	$1 script -i $3 $4 > split.script
      	$2 script -i $3 $4 > default.script
      
      	diff split.script default.script | head -n 20
      
      And it was run like this, with various itrace options depending on the
      quantity of synthesised events:
      
      	compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns
      
      No changes in output were observed in the following scenarios:
      
      * Simple per-cpu
      	perf record -e cs_etm/@tmc_etr0/u top
      
      * Per-thread, single thread
      	perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C
      
      * Per-thread multiple threads (but only one thread collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-thread multiple threads (both threads collected data):
      	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
      
      * Per-cpu explicit threads:
      	perf record -e cs_etm/@tmc_etr0/u --pid 853,854
      
      * System-wide (per-cpu):
          perf record -e cs_etm/@tmc_etr0/u -a
      
      * No data collected (no aux buffers)
      	Can happen with any command when run for a short period
      
      * Containing truncated records
      	Can happen with any command
      
      * Containing aux records with 0 size
      	Can happen with any command
      
      * Snapshot mode (various files with and without buffer wrap)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Some differences were observed in the following scenario:
      
      * Snapshot mode (with duplicate buffers)
      	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
      
      Fewer samples are generated in snapshot mode if duplicate buffers
      were gathered because buffers with the same offset are now only added
      once. This gives different, but more correct results and no duplicate
      data is decoded any more.
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Branislav Rankov <branislav.rankov@arm.com>
      Cc: Denis Nikitin <denik@chromium.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210624164303.28632-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83d1fc92
  8. 14 7月, 2021 3 次提交
    • H
      libperf: Fix build error with LIBPFM4=1 · 50e98924
      Heiko Carstens 提交于
      Fix build error with LIBPFM4=1:
      
          CC      util/pfm.o
        util/pfm.c: In function ‘parse_libpfm_events_option’:
        util/pfm.c:102:30: error: ‘struct evsel’ has no member named ‘leader’
          102 |                         evsel->leader = grp_leader;
              |                              ^~
      
      Committer notes:
      
      There is this entry in 'make -C tools/perf build-test' to test the build
      with libpfm:
      
        $ grep libpfm tools/perf/tests/make
        make_with_libpfm4   := LIBPFM4=1
        run += make_with_libpfm4
        $
      
      But the test machine lacked libpfm-devel, now its installed and further
      cases like this shouldn't happen.
      
      Committer testing:
      
      Before this patch this fails, after applying it:
      
        $ make -C tools/perf build-test
        make: Entering directory '/var/home/acme/git/perf/tools/perf'
        - tarpkg: ./tests/perf-targz-src-pkg .
                         make_static: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 -j24  DESTDIR=/tmp/tmp.KzFSfvGRQa
        <SNIP>
                   make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                 make_with_libpfm4_O: make LIBPFM4=1
               make_install_prefix_O: make install prefix=/tmp/krava
                  make_no_auxtrace_O: make NO_AUXTRACE=1
        <SNIP>
        $ rpm -q libpfm-devel
        libpfm-devel-4.11.0-4.fc34.x86_64
        $
      
      FIXME:
      
      This shows a need for 'build-test' to bail out when a build option is
      specified that has no required library devel files installed.
      
      Fixes: fba7c866 ("libperf: Move 'leader' from tools/perf to perf_evsel::leader")
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210713091907.1555560-1-hca@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      50e98924
    • J
      perf stat: Merge uncore events by default for hybrid platform · e0a7ef2a
      Jin Yao 提交于
      On a hybrid platform, by default 'perf stat' aggregates and reports the
      event counts per PMU. For example,
      
        # perf stat -e cycles -a true
      
         Performance counter stats for 'system wide':
      
                 1,400,445      cpu_core/cycles/
                   680,881      cpu_atom/cycles/
      
               0.001770773 seconds time elapsed
      
      But for uncore events that's not a suitable method. Uncore has nothing
      to do with hybrid. So for uncore events, we aggregate event counts from
      all PMUs and report the counts without PMUs.
      
      Before:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     2,058      uncore_arb_0/event=0x81,umask=0x1/
                     2,028      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000614498 seconds time elapsed
      
      After:
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
      
         Performance counter stats for 'system wide':
      
                     3,996      arb/event=0x81,umask=0x1/
                         0      arb/event=0x84,umask=0x1/
      
               0.000630046 seconds time elapsed
      
      Of course, we also keep the '--no-merge' working for uncore events.
      
        # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
      
         Performance counter stats for 'system wide':
      
                     1,952      uncore_arb_0/event=0x81,umask=0x1/
                     1,921      uncore_arb_1/event=0x81,umask=0x1/
                         0      uncore_arb_0/event=0x84,umask=0x1/
                         0      uncore_arb_1/event=0x84,umask=0x1/
      
               0.000575536 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0a7ef2a
    • J
      perf pmu: Skip invalid hybrid pmu · 49afa7f6
      Jin Yao 提交于
      On hybrid platform, such as Alderlake, if atom CPUs are offlined,
      the kernel still exports the sysfs path '/sys/devices/cpu_atom/' for
      'cpu_atom' pmu but the file '/sys/devices/cpu_atom/cpus' is empty,
      which indicates this is an invalid pmu.
      
      Need to check and skip the invalid hybrid pmu.
      
      Before:
      
        # perf list
        ...
        branch-instructions OR cpu_atom/branch-instructions/ [Kernel PMU event]
        branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event]
        branch-misses OR cpu_atom/branch-misses/           [Kernel PMU event]
        branch-misses OR cpu_core/branch-misses/           [Kernel PMU event]
        bus-cycles OR cpu_atom/bus-cycles/                 [Kernel PMU event]
        bus-cycles OR cpu_core/bus-cycles/                 [Kernel PMU event]
        ...
      
      The cpu_atom events are still displayed even if atom CPUs are offlined.
      
      After:
      
        # perf list
        ...
        branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event]
        branch-misses OR cpu_core/branch-misses/           [Kernel PMU event]
        bus-cycles OR cpu_core/bus-cycles/                 [Kernel PMU event]
        ...
      
      Now only cpu_core events are displayed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210708013701.20347-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      49afa7f6
  9. 10 7月, 2021 5 次提交
  10. 07 7月, 2021 6 次提交
    • A
      perf intel-pt: Add a config for max loops without consuming a packet · b4b046ff
      Adrian Hunter 提交于
      The Intel PT decoder limits the number of unconditional branches (e.g.
      jmps) decoded without consuming any trace packets. Generally, a loop
      needs a conditional branch which generates a TNT packet, whereas a "ret"
      instruction will generate a TIP or TNT packet. So exceeding the limit is
      assumed to be a never-ending loop, which can happen if there has been a
      decoding error putting the decoder at the wrong place in the code.
      
      Up until now, the limit of 10000 has been enough but some analytic
      purposes have been reported to exceed that.
      
      Increase the limit to 100000, and make it configurable via perf config
      intel-pt.max-loops. Also amend the "Never-ending loop" message to
      mention the configuration entry.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20210701175132.3977-1-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b4b046ff
    • J
      perf stat: Disable the NMI watchdog message on hybrid · 493be70a
      Jin Yao 提交于
      If we run a single workload that only runs on big core, there is always
      a ugly message about disabling the NMI watchdog because the atom is not
      counted.
      
      Before:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.43 msec task-clock                #    0.396 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        45      page-faults               #  103.918 K/sec
                   639,634      cpu_core/cycles/          #    1.477 G/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   643,498      cpu_core/instructions/    #    1.486 G/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   123,715      cpu_core/branches/        #  285.694 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,094      cpu_core/branch-misses/   #    9.454 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001092407 seconds time elapsed
      
               0.001144000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.001904106 seconds time elapsed
      
               0.001947000 seconds user
               0.000000000 seconds sys
      
        Some events weren't counted. Try disabling the NMI watchdog:
                echo 0 > /proc/sys/kernel/nmi_watchdog
                perf stat ...
                echo 1 > /proc/sys/kernel/nmi_watchdog
        The events in group usually have to be from the same PMU. Try reorganizing the group.
      
      Now we disable the NMI watchdog message on hybrid, otherwise there
      are too many false positives.
      
      After:
      
        # ./perf stat true
      
         Performance counter stats for 'true':
      
                      0.79 msec task-clock                #    0.419 CPUs utilized
                         0      context-switches          #    0.000 /sec
                         0      cpu-migrations            #    0.000 /sec
                        48      page-faults               #   60.889 K/sec
                   777,692      cpu_core/cycles/          #  986.519 M/sec
             <not counted>      cpu_atom/cycles/                                              (0.00%)
                   669,147      cpu_core/instructions/    #  848.828 M/sec
             <not counted>      cpu_atom/instructions/                                        (0.00%)
                   128,635      cpu_core/branches/        #  163.176 M/sec
             <not counted>      cpu_atom/branches/                                            (0.00%)
                     4,089      cpu_core/branch-misses/   #    5.187 M/sec
             <not counted>      cpu_atom/branch-misses/                                       (0.00%)
      
               0.001880649 seconds time elapsed
      
               0.001935000 seconds user
               0.000000000 seconds sys
      
        # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
      
         Performance counter stats for 'true':
      
             <not counted>      cpu_atom/cycles/                                              (0.00%)
             <not counted>      msr/tsc/                                                      (0.00%)
      
               0.000963319 seconds time elapsed
      
               0.000999000 seconds user
               0.000000000 seconds sys
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210610034557.29766-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      493be70a
    • K
      perf script python: Fix buffer size to report iregs in perf script · dea8cfcc
      Kajol Jain 提交于
      Commit 48a1f565 ("perf script python: Add more PMU fields to
      event handler dict") added functionality to report fields like weight,
      iregs, uregs etc via perf report.  That commit predefined buffer size to
      512 bytes to print those fields.
      
      But in PowerPC, since we added extended regs support in:
      
        068aeea3 ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs")
        d735599a ("powerpc/perf: Add extended regs support for power10 platform")
      
      Now iregs can carry more bytes of data and this predefined buffer size
      can result to data loss in perf script output.
      
      This patch resolves this issue by making the buffer size dynamic, based
      on the number of registers needed to print. It also changes the
      regs_map() return type from int to void, as it is not being used by the
      set_regs_in_dict(), its only caller.
      
      Fixes: 068aeea3 ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs")
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Tested-by: NNageswara R Sastry <rnsastry@linux.ibm.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20210628062341.155839-1-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dea8cfcc
    • R
      perf top: Fix overflow in elf_sec__is_text() · 83952286
      Riccardo Mancini 提交于
      ASan reports a heap-buffer-overflow in elf_sec__is_text when using perf-top.
      
      The bug is caused by the fact that secstrs is built from runtime_ss, while
      shdr is built from syms_ss if shdr.sh_type != SHT_NOBITS. Therefore, they
      point to two different ELF files.
      
      This patch renames secstrs to secstrs_run and adds secstrs_sym, so that
      the correct secstrs is chosen depending on shdr.sh_type.
      
        $ ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1 ./perf top
        =================================================================
        ==363148==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61300009add6 at pc 0x00000049875c bp 0x7f4f56446440 sp 0x7f4f56445bf0
        READ of size 1 at 0x61300009add6 thread T6
          #0 0x49875b in StrstrCheck(void*, char*, char const*, char const*) (/home/user/linux/tools/perf/perf+0x49875b)
          #1 0x4d13a2 in strstr (/home/user/linux/tools/perf/perf+0x4d13a2)
          #2 0xacae36 in elf_sec__is_text /home/user/linux/tools/perf/util/symbol-elf.c:176:9
          #3 0xac3ec9 in elf_sec__filter /home/user/linux/tools/perf/util/symbol-elf.c:187:9
          #4 0xac2c3d in dso__load_sym /home/user/linux/tools/perf/util/symbol-elf.c:1254:20
          #5 0x883981 in dso__load /home/user/linux/tools/perf/util/symbol.c:1897:9
          #6 0x8e6248 in map__load /home/user/linux/tools/perf/util/map.c:332:7
          #7 0x8e66e5 in map__find_symbol /home/user/linux/tools/perf/util/map.c:366:6
          #8 0x7f8278 in machine__resolve /home/user/linux/tools/perf/util/event.c:707:13
          #9 0x5f3d1a in perf_event__process_sample /home/user/linux/tools/perf/builtin-top.c:773:6
          #10 0x5f30e4 in deliver_event /home/user/linux/tools/perf/builtin-top.c:1197:3
          #11 0x908a72 in do_flush /home/user/linux/tools/perf/util/ordered-events.c:244:9
          #12 0x905fae in __ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:323:8
          #13 0x9058db in ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:341:9
          #14 0x5f19b1 in process_thread /home/user/linux/tools/perf/builtin-top.c:1109:7
          #15 0x7f4f6a21a298 in start_thread /usr/src/debug/glibc-2.33-16.fc34.x86_64/nptl/pthread_create.c:481:8
          #16 0x7f4f697d0352 in clone ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
      0x61300009add6 is located 10 bytes to the right of 332-byte region [0x61300009ac80,0x61300009adcc)
      allocated by thread T6 here:
      
          #0 0x4f3f7f in malloc (/home/user/linux/tools/perf/perf+0x4f3f7f)
          #1 0x7f4f6a0a88d9  (/lib64/libelf.so.1+0xa8d9)
      
      Thread T6 created by T0 here:
      
          #0 0x464856 in pthread_create (/home/user/linux/tools/perf/perf+0x464856)
          #1 0x5f06e0 in __cmd_top /home/user/linux/tools/perf/builtin-top.c:1309:6
          #2 0x5ef19f in cmd_top /home/user/linux/tools/perf/builtin-top.c:1762:11
          #3 0x7b28c0 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
          #4 0x7b119f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
          #5 0x7b2423 in run_argv /home/user/linux/tools/perf/perf.c:409:2
          #6 0x7b0c19 in main /home/user/linux/tools/perf/perf.c:539:3
          #7 0x7f4f696f7b74 in __libc_start_main /usr/src/debug/glibc-2.33-16.fc34.x86_64/csu/../csu/libc-start.c:332:16
      
        SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/user/linux/tools/perf/perf+0x49875b) in StrstrCheck(void*, char*, char const*, char const*)
        Shadow bytes around the buggy address:
          0x0c268000b560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0c268000b5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        =>0x0c268000b5b0: 00 00 00 00 00 00 00 00 00 04[fa]fa fa fa fa fa
          0x0c268000b5c0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
          0x0c268000b5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0c268000b5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0c268000b5f0: 07 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
          0x0c268000b600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        Shadow byte legend (one shadow byte represents 8 application bytes):
          Addressable:           00
          Partially addressable: 01 02 03 04 05 06 07
          Heap left redzone:       fa
          Freed heap region:       fd
          Stack left redzone:      f1
          Stack mid redzone:       f2
          Stack right redzone:     f3
          Stack after return:      f5
          Stack use after scope:   f8
          Global redzone:          f9
          Global init order:       f6
          Poisoned by user:        f7
          Container overflow:      fc
          Array cookie:            ac
          Intra object redzone:    bb
          ASan internal:           fe
          Left alloca redzone:     ca
          Right alloca redzone:    cb
          Shadow gap:              cc
        ==363148==ABORTING
      Suggested-by: NJiri Slaby <jirislaby@kernel.org>
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Fabian Hemmer <copy@copy.sh>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Link: http://lore.kernel.org/lkml/20210621222108.196219-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83952286
    • M
      perf symbol-elf: Decode dynsym even if symtab exists · 87704345
      Masami Hiramatsu 提交于
      In Fedora34, libc-2.33.so has both .dynsym and .symtab sections and
      most of (not all) symbols moved to .dynsym. In this case, perf only
      decode the symbols in .symtab, and perf probe can not list up the
      functions in the library.
      
      To fix this issue, decode both .symtab and .dynsym sections.
      
      Without this fix,
        -----
        $ ./perf probe -x /usr/lib64/libc-2.33.so -F
        @plt
        @plt
        calloc@plt
        free@plt
        malloc@plt
        memalign@plt
        realloc@plt
        -----
      
      With this fix.
      
        -----
        $ ./perf probe -x /usr/lib64/libc-2.33.so -F
        @plt
        @plt
        a64l
        abort
        abs
        accept
        accept4
        access
        acct
        addmntent
        -----
      Reported-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Stefan Liebler <stli@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/162532652681.393143.10163733179955267999.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      87704345
    • M
      perf probe: Fix debuginfo__new() to enable build-id based debuginfo · eb4717f7
      Masami Hiramatsu 提交于
      Fix debuginfo__new() to set the build-id to dso before
      dso__read_binary_type_filename() so that it can find
      DSO_BINARY_TYPE__BUILDID_DEBUGINFO debuginfo correctly.
      
      However, this may not change the result, because elfutils (libdwfl) has
      its own debuginfo finder. With/without this patch, the perf probe
      correctly find the debuginfo file.
      
      This is just a failsafe and keep code's sanity (if you use
      dso__read_binary_type_filename(), you must set the build-id to the dso.)
      Reported-by: NThomas Richter <tmricht@linux.ibm.com>
      Acked-by: NThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: NMasami Hiramatsu <mhriamat@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Stefan Liebler <stli@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/162532651863.393143.11692691321219235810.stgit@devnote2Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eb4717f7
  11. 06 7月, 2021 1 次提交
  12. 02 7月, 2021 2 次提交
    • R
      perf session: Add missing evlist__delete when deleting a session · cf96b8e4
      Riccardo Mancini 提交于
      ASan reports a memory leak caused by evlist not being deleted on exit in
      perf-report, perf-script and perf-data.
      The problem is caused by evlist->session not being deleted, which is
      allocated in perf_session__read_header, called in perf_session__new if
      perf_data is in read mode.
      In case of write mode, the session->evlist is filled by the caller.
      This patch solves the problem by calling evlist__delete in
      perf_session__delete if perf_data is in read mode.
      
      Changes in v2:
       - call evlist__delete from within perf_session__delete
      
      v1: https://lore.kernel.org/lkml/20210621234317.235545-1-rickyman7@gmail.com/
      
      ASan report follows:
      
      $ ./perf script report flamegraph
      =================================================================
      ==227640==ERROR: LeakSanitizer: detected memory leaks
      
      <SNIP unrelated>
      
      Indirect leak of 2704 byte(s) in 1 object(s) allocated from:
          #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137)
          #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9
          #2 0x7f999e in evlist__new /home/user/linux/tools/perf/util/evlist.c:77:26
          #3 0x8ad938 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3797:20
          #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6
          #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10
          #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12
          #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
          #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
          #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2
          #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3
          #11 0x7f5260654b74  (/lib64/libc.so.6+0x27b74)
      
      Indirect leak of 568 byte(s) in 1 object(s) allocated from:
          #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137)
          #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9
          #2 0x80ce88 in evsel__new_idx /home/user/linux/tools/perf/util/evsel.c:268:24
          #3 0x8aed93 in evsel__new /home/user/linux/tools/perf/util/evsel.h:210:9
          #4 0x8ae07e in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3853:11
          #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6
          #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10
          #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12
          #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
          #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
          #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2
          #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3
          #12 0x7f5260654b74  (/lib64/libc.so.6+0x27b74)
      
      Indirect leak of 264 byte(s) in 1 object(s) allocated from:
          #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137)
          #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9
          #2 0xbe3e70 in xyarray__new /home/user/linux/tools/lib/perf/xyarray.c:10:23
          #3 0xbd7754 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:361:21
          #4 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7
          #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6
          #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10
          #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12
          #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
          #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
          #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2
          #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3
          #12 0x7f5260654b74  (/lib64/libc.so.6+0x27b74)
      
      Indirect leak of 32 byte(s) in 1 object(s) allocated from:
          #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137)
          #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9
          #2 0xbd77e0 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:365:14
          #3 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7
          #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6
          #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10
          #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12
          #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
          #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
          #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2
          #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3
          #11 0x7f5260654b74  (/lib64/libc.so.6+0x27b74)
      
      Indirect leak of 7 byte(s) in 1 object(s) allocated from:
          #0 0x4b8207 in strdup (/home/user/linux/tools/perf/perf+0x4b8207)
          #1 0x8b4459 in evlist__set_event_name /home/user/linux/tools/perf/util/header.c:2292:16
          #2 0x89d862 in process_event_desc /home/user/linux/tools/perf/util/header.c:2313:3
          #3 0x8af319 in perf_file_section__process /home/user/linux/tools/perf/util/header.c:3651:9
          #4 0x8aa6e9 in perf_header__process_sections /home/user/linux/tools/perf/util/header.c:3427:9
          #5 0x8ae3e7 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3886:2
          #6 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6
          #7 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10
          #8 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12
          #9 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
          #10 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
          #11 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2
          #12 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3
          #13 0x7f5260654b74  (/lib64/libc.so.6+0x27b74)
      
      SUMMARY: AddressSanitizer: 3728 byte(s) leaked in 7 allocation(s).
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210624231926.212208-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cf96b8e4
    • A
      perf dlfilter: Add object_code() to perf_dlfilter_fns · ec4c00fe
      Adrian Hunter 提交于
      Add a function, for use by dlfilters, to read object code.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210627131818.810-11-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec4c00fe