1. 19 2月, 2021 5 次提交
  2. 18 2月, 2021 11 次提交
  3. 17 2月, 2021 1 次提交
    • J
      perf symbols: Resolve symbols against debug file first · 6833e0b8
      Jiri Slaby 提交于
      With LTO, there are symbols like these:
      
      /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug
       10305: 0000000000955fa4     0 NOTYPE  LOCAL  DEFAULT   29 Predicate.cpp.2bc410e7
      
      This comes from a runtime/debug split done by the standard way:
      
        objcopy --only-keep-debug $runtime $debug
        objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line --strip-all $runtime
      
      perf currently cannot resolve such symbols (relicts of LTO), as section
      29 exists only in the debug file (29 is .debug_info). And perf resolves
      symbols only against runtime file. This results in all symbols from such
      a library being unresolved:
      
           0.38%  main2    libantlr4-runtime.so.4.8  [.] 0x00000000000671e0
      
      So try resolving against the debug file first. And only if it fails (the
      section has NOBITS set), try runtime file. We can do this, as "objcopy
      --only-keep-debug" per documentation preserves all sections, but clears
      data of some of them (the runtime ones) and marks them as NOBITS.
      
      The correct result is now:
           0.38%  main2    libantlr4-runtime.so.4.8  [.] antlr4::IntStream::~IntStream
      
      Note that these LTO symbols are properly skipped anyway as they belong
      neither to *text* nor to *data* (is_label && !elf_sec__filter(&shdr,
      secstrs) is true).
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210217122125.26416-1-jslaby@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6833e0b8
  4. 16 2月, 2021 5 次提交
    • A
      37b3fa0e
    • A
      Merge branch 'perf/urgent' into perf/core · c1bd8a2b
      Arnaldo Carvalho de Melo 提交于
      To get some fixes that didn't made into 5.11.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1bd8a2b
    • L
      perf arm-spe: Set sample's data source field · a89dbc9b
      Leo Yan 提交于
      The sample structure contains the field 'data_src' which is used to
      tell the data operation attributions, e.g. operation type is loading or
      storing, cache level, it's snooping or remote accessing, etc.  At the
      end, the 'data_src' will be parsed by perf mem/c2c tools to display
      human readable strings.
      
      This patch is to fill the 'data_src' field in the synthesized samples
      base on different types.  Currently perf tool can display statistics for
      L1/L2/L3 caches but it doesn't support the 'last level cache'.  To fit
      to current implementation, 'data_src' field uses L3 cache for last level
      cache.
      
      Before this commit, perf mem report looks like this:
        # Samples: 75K of event 'l1d-miss'
        # Total weight : 75951
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
        #
        # Overhead  Samples  Local Weight  Memory access  Symbol                  Shared Object  Data Symbol             Data Object  Snoop  TLB access
        # ........  .......  ............  .............  ......................  .............  ......................  ...........  .....  ..........
        #
            81.56%    61945  0             N/A            [.] 0x00000000000009d8  serial_c       [.] 0000000000000000    [unknown]    N/A    N/A
            18.44%    14003  0             N/A            [.] 0x0000000000000828  serial_c       [.] 0000000000000000    [unknown]    N/A    N/A
      
      Now on a system with Arm SPE, addresses and access types are displayed:
      
        # Samples: 75K of event 'l1d-miss'
        # Total weight : 75951
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
        #
        # Overhead  Samples  Local Weight  Memory access  Symbol                  Shared Object  Data Symbol             Data Object  Snoop  TLB access
        # ........  .......  ............  .............  ......................  .............  ......................  ...........  .....  ..........
        #
             0.43%      324  0             L1 miss        [.] 0x00000000000009d8  serial_c       [.] 0x0000ffff80794e00  anon         N/A    Walker hit
             0.42%      322  0             L1 miss        [.] 0x00000000000009d8  serial_c       [.] 0x0000ffff80794580  anon         N/A    Walker hit
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Tested-by: NJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wei Li <liwei391@huawei.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Link: https://lore.kernel.org/r/20210211133856.2137-6-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a89dbc9b
    • L
      perf arm-spe: Synthesize memory event · e55ed342
      Leo Yan 提交于
      The memory event can deliver two benefits:
      
      - The first benefit is the memory event can give out global view for
        memory accessing, rather than organizing events with scatter mode
        (e.g. uses separate event for L1 cache, last level cache, etc) which
        which can only display a event for single memory type, memory events
        include all memory accessing so it can display the data accessing
        cross memory levels in the same view;
      
      - The second benefit is the sample generation might introduce a big
        overhead and need to wait for long time for Perf reporting, we can
        specify itrace option '--itrace=M' to filter out other events and only
        output memory events, this can significantly reduce the overhead
        caused by generating samples.
      
      This patch is to enable memory event for Arm SPE.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Tested-by: NJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wei Li <liwei391@huawei.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Link: https://lore.kernel.org/r/20210211133856.2137-5-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e55ed342
    • L
      perf arm-spe: Fill address info for samples · 54f7815e
      Leo Yan 提交于
      To properly handle memory and branch samples, this patch divides into
      two functions for generating samples: arm_spe__synth_mem_sample() is for
      synthesizing memory and TLB samples; arm_spe__synth_branch_sample() is
      to synthesize branch samples.
      
      Arm SPE backend decoder has passed virtual and physical address through
      packets, the address info is stored into the synthesize samples in the
      function arm_spe__synth_mem_sample().
      
      Committer notes:
      
      Fixed this:
      
        36    46.77 fedora:27                     : FAIL clang version 5.0.2 (tags/RELEASE_502/final)
      
          util/arm-spe.c:269:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers]
                  struct perf_sample sample = { 0 };
                                                  ^
          util/arm-spe.c:288:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers]
                  struct perf_sample sample = { 0 };
      
      By using = { .ip = 0, };
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Tested-by: NJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wei Li <liwei391@huawei.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Link: https://lore.kernel.org/r/20210211133856.2137-4-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      54f7815e
  5. 15 2月, 2021 7 次提交
  6. 14 2月, 2021 11 次提交