1. 19 2月, 2021 7 次提交
    • K
      perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing · fbefe9c2
      Kan Liang 提交于
      For X86, the var2_w field of PERF_SAMPLE_WEIGHT_STRUCT stands for the
      instruction latency. Current perf forces the var2_w to the data->ins_lat
      in the generic code. It works well for now because X86 is the only
      architecture that supports the PERF_SAMPLE_WEIGHT_STRUCT, but it may
      bring problems once other architectures support the sample type.  For
      example, the var2_w may be used to capture something else on PowerPC.
      
      Create two architecture specific functions to parse and synthesize the
      weight related samples. Move the X86 specific codes to the X86 version
      functions. Other architectures can implement their own functions later
      separately.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/1612540912-6562-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fbefe9c2
    • A
      perf intel-pt: Add PSB events · c840cbfe
      Adrian Hunter 提交于
      Emitting a PSB+ can cause a CPU a slight delay. When doing timing analysis
      of code with Intel PT, it is useful to know if a timing bubble was caused
      by Intel PT or not. Add reporting of PSB events via perf script. PSB
      events are printed with the existing itrace 'p' option which also prints
      power and frequency changes. The PSB event contains the trace offset at
      which the PSB occurs, to allow easy reference back to the PSB+ packets.
      
      The PSB event timestamp is always the timestamp from the PSB+ TSC
      packet, and the ip is always the address from the PSB+ FUP packet.
      
      The code changes are non-trivial because the decoder must walk to the
      PSB+ FUP address before outputting the PSB event.
      
      Example:
      
        $ perf record -e intel_pt/cyc,psb_period=0/u uname
        Linux
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.046 MB perf.data ]
        $ perf script --itrace=p --ns
           perf 17981 [006] 25617.510820383:  psb:  psb offs: 0                               0 [unknown] ([unknown])
           perf 17981 [006] 25617.510820383:  cbr:  cbr: 42 freq: 4219 MHz (156%)             0 [unknown] ([unknown])
          uname 17981 [006] 25617.510889753:  psb:  psb offs: 0xb50                7f78c12a212e __GI___tunables_init+0xee (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.510899162:  psb:  psb offs: 0x12d0               7f78c128af1c dl_main+0x93c (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.510939242:  psb:  psb offs: 0x1a50               7f78c128eefc _dl_map_object_from_fd+0x13c (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.510981274:  psb:  psb offs: 0x21c8               7f78c1296307 _dl_relocate_object+0x927 (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.510993034:  psb:  psb offs: 0x2948               7f78c12940e4 _dl_lookup_symbol_x+0x14 (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.511003871:  psb:  psb offs: 0x30c8               7f78c12937b3 do_lookup_x+0x2f3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.511019854:  psb:  psb offs: 0x3850               7f78c1295eed _dl_relocate_object+0x50d (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.511029015:  psb:  psb offs: 0x4390               7f78c12a855a strcmp+0xf6a (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
          uname 17981 [006] 25617.511064876:  psb:  psb offs: 0x4b10                          0 [unknown] ([unknown])
          uname 17981 [006] 25617.511080762:  psb:  psb offs: 0x5290               7f78c11db53d _dl_addr+0x13d (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511086035:  psb:  psb offs: 0x5a08               7f78c11db538 _dl_addr+0x138 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511091381:  psb:  psb offs: 0x6190               7f78c11db534 _dl_addr+0x134 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511096681:  psb:  psb offs: 0x6910               7f78c11db4c3 _dl_addr+0xc3 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511119520:  psb:  psb offs: 0x7090               7f78c10ada5e _nl_intern_locale_data+0x12e (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511126584:  psb:  psb offs: 0x7818               7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511132775:  psb:  psb offs: 0x8358               7f78c10c20c0 getenv+0xa0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511134598:  psb:  psb offs: 0x8ad0               7f78c10ada09 _nl_intern_locale_data+0xd9 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511135685:  psb:  psb offs: 0x9258               7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511138322:  psb:  psb offs: 0x99d0               7f78c11fffd9 __strncmp_avx2+0x39 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
          uname 17981 [006] 25617.511158907:  psb:  psb offs: 0xa150                          0 [unknown] ([unknown])
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20210205175350.23817-5-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c840cbfe
    • A
      perf intel-pt: Fix IPC with CYC threshold · 6af4b600
      Adrian Hunter 提交于
      The code assumed every CYC-eligible packet has a CYC packet, which is not
      the case when CYC thresholds are used. Fix by checking if a CYC packet is
      actually present in that case.
      
      Fixes: 5b1dc0fd ("perf intel-pt: Add support for samples to contain IPC ratio")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20210205175350.23817-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6af4b600
    • A
      perf intel-pt: Fix premature IPC · 20aa3970
      Adrian Hunter 提交于
      The code assumed a change in cycle count means accurate IPC. That is not
      correct, for example when sampling both branches and instructions, or at
      a FUP packet (which is not CYC-eligible) address. Fix by using an explicit
      flag to indicate when IPC can be sampled.
      
      Fixes: 5b1dc0fd ("perf intel-pt: Add support for samples to contain IPC ratio")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Link: https://lore.kernel.org/r/20210205175350.23817-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      20aa3970
    • A
      perf intel-pt: Fix missing CYC processing in PSB · 03fb0f85
      Adrian Hunter 提交于
      Add missing CYC packet processing when walking through PSB+. This
      improves the accuracy of timestamps that follow PSB+, until the next
      MTC.
      
      Fixes: 3d498078 ("perf tools: Add new Intel PT packet definitions")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20210205175350.23817-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      03fb0f85
    • D
      perf unwind: Set userdata for all __report_module() paths · 4e148144
      Dave Rigby 提交于
      When locating the DWARF module for a given address, __find_debuginfo()
      requires a 'struct dso' passed via the userdata argument.
      
      However, this field is only set in __report_module() if the module is
      found in via dwfl_addrmodule(), not if it is found later via
      dwfl_report_elf().
      
      Set userdata irrespective of how the DWARF module was found, as long as
      we found a module.
      
      Fixes: bf53fc6b ("perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder")
      Signed-off-by: NDave Rigby <d.rigby@me.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211801Acked-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/linux-perf-users/20210218165654.36604-1-d.rigby@me.com/Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4e148144
    • Y
      perf record: Fix continue profiling after draining the buffer · e16c2ce7
      Yang Jihong 提交于
      Commit da231338 ("perf record: Use an eventfd to wakeup when
      done") uses eventfd() to solve a rare race where the setting and
      checking of 'done' which add done_fd to pollfd.  When draining buffer,
      revents of done_fd is 0 and evlist__filter_pollfd function returns a
      non-zero value.  As a result, perf record does not stop profiling.
      
      The following simple scenarios can trigger this condition:
      
        # sleep 10 &
        # perf record -p $!
      
      After the sleep process exits, perf record should stop profiling and exit.
      However, perf record keeps running.
      
      If pollfd revents contains only POLLERR or POLLHUP, perf record
      indicates that buffer is draining and need to stop profiling.  Use
      fdarray_flag__nonfilterable() to set done eventfd to nonfilterable
      objects, so that evlist__filter_pollfd() does not filter and check done
      eventfd.
      
      Fixes: da231338 ("perf record: Use an eventfd to wakeup when done")
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: zhangjinhao2@huawei.com
      Link: http://lore.kernel.org/lkml/20210205065001.23252-1-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e16c2ce7
  2. 18 2月, 2021 11 次提交
  3. 17 2月, 2021 1 次提交
    • J
      perf symbols: Resolve symbols against debug file first · 6833e0b8
      Jiri Slaby 提交于
      With LTO, there are symbols like these:
      
      /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug
       10305: 0000000000955fa4     0 NOTYPE  LOCAL  DEFAULT   29 Predicate.cpp.2bc410e7
      
      This comes from a runtime/debug split done by the standard way:
      
        objcopy --only-keep-debug $runtime $debug
        objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line --strip-all $runtime
      
      perf currently cannot resolve such symbols (relicts of LTO), as section
      29 exists only in the debug file (29 is .debug_info). And perf resolves
      symbols only against runtime file. This results in all symbols from such
      a library being unresolved:
      
           0.38%  main2    libantlr4-runtime.so.4.8  [.] 0x00000000000671e0
      
      So try resolving against the debug file first. And only if it fails (the
      section has NOBITS set), try runtime file. We can do this, as "objcopy
      --only-keep-debug" per documentation preserves all sections, but clears
      data of some of them (the runtime ones) and marks them as NOBITS.
      
      The correct result is now:
           0.38%  main2    libantlr4-runtime.so.4.8  [.] antlr4::IntStream::~IntStream
      
      Note that these LTO symbols are properly skipped anyway as they belong
      neither to *text* nor to *data* (is_label && !elf_sec__filter(&shdr,
      secstrs) is true).
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210217122125.26416-1-jslaby@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6833e0b8
  4. 16 2月, 2021 5 次提交
    • A
      37b3fa0e
    • A
      Merge branch 'perf/urgent' into perf/core · c1bd8a2b
      Arnaldo Carvalho de Melo 提交于
      To get some fixes that didn't made into 5.11.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1bd8a2b
    • L
      perf arm-spe: Set sample's data source field · a89dbc9b
      Leo Yan 提交于
      The sample structure contains the field 'data_src' which is used to
      tell the data operation attributions, e.g. operation type is loading or
      storing, cache level, it's snooping or remote accessing, etc.  At the
      end, the 'data_src' will be parsed by perf mem/c2c tools to display
      human readable strings.
      
      This patch is to fill the 'data_src' field in the synthesized samples
      base on different types.  Currently perf tool can display statistics for
      L1/L2/L3 caches but it doesn't support the 'last level cache'.  To fit
      to current implementation, 'data_src' field uses L3 cache for last level
      cache.
      
      Before this commit, perf mem report looks like this:
        # Samples: 75K of event 'l1d-miss'
        # Total weight : 75951
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
        #
        # Overhead  Samples  Local Weight  Memory access  Symbol                  Shared Object  Data Symbol             Data Object  Snoop  TLB access
        # ........  .......  ............  .............  ......................  .............  ......................  ...........  .....  ..........
        #
            81.56%    61945  0             N/A            [.] 0x00000000000009d8  serial_c       [.] 0000000000000000    [unknown]    N/A    N/A
            18.44%    14003  0             N/A            [.] 0x0000000000000828  serial_c       [.] 0000000000000000    [unknown]    N/A    N/A
      
      Now on a system with Arm SPE, addresses and access types are displayed:
      
        # Samples: 75K of event 'l1d-miss'
        # Total weight : 75951
        # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
        #
        # Overhead  Samples  Local Weight  Memory access  Symbol                  Shared Object  Data Symbol             Data Object  Snoop  TLB access
        # ........  .......  ............  .............  ......................  .............  ......................  ...........  .....  ..........
        #
             0.43%      324  0             L1 miss        [.] 0x00000000000009d8  serial_c       [.] 0x0000ffff80794e00  anon         N/A    Walker hit
             0.42%      322  0             L1 miss        [.] 0x00000000000009d8  serial_c       [.] 0x0000ffff80794580  anon         N/A    Walker hit
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Tested-by: NJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wei Li <liwei391@huawei.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Link: https://lore.kernel.org/r/20210211133856.2137-6-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a89dbc9b
    • L
      perf arm-spe: Synthesize memory event · e55ed342
      Leo Yan 提交于
      The memory event can deliver two benefits:
      
      - The first benefit is the memory event can give out global view for
        memory accessing, rather than organizing events with scatter mode
        (e.g. uses separate event for L1 cache, last level cache, etc) which
        which can only display a event for single memory type, memory events
        include all memory accessing so it can display the data accessing
        cross memory levels in the same view;
      
      - The second benefit is the sample generation might introduce a big
        overhead and need to wait for long time for Perf reporting, we can
        specify itrace option '--itrace=M' to filter out other events and only
        output memory events, this can significantly reduce the overhead
        caused by generating samples.
      
      This patch is to enable memory event for Arm SPE.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Tested-by: NJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wei Li <liwei391@huawei.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Link: https://lore.kernel.org/r/20210211133856.2137-5-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e55ed342
    • L
      perf arm-spe: Fill address info for samples · 54f7815e
      Leo Yan 提交于
      To properly handle memory and branch samples, this patch divides into
      two functions for generating samples: arm_spe__synth_mem_sample() is for
      synthesizing memory and TLB samples; arm_spe__synth_branch_sample() is
      to synthesize branch samples.
      
      Arm SPE backend decoder has passed virtual and physical address through
      packets, the address info is stored into the synthesize samples in the
      function arm_spe__synth_mem_sample().
      
      Committer notes:
      
      Fixed this:
      
        36    46.77 fedora:27                     : FAIL clang version 5.0.2 (tags/RELEASE_502/final)
      
          util/arm-spe.c:269:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers]
                  struct perf_sample sample = { 0 };
                                                  ^
          util/arm-spe.c:288:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers]
                  struct perf_sample sample = { 0 };
      
      By using = { .ip = 0, };
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Tested-by: NJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <al.grant@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wei Li <liwei391@huawei.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NJames Clark <james.clark@arm.com>
      Link: https://lore.kernel.org/r/20210211133856.2137-4-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      54f7815e
  5. 15 2月, 2021 7 次提交
  6. 14 2月, 2021 9 次提交