- 19 2月, 2021 7 次提交
-
-
由 Kan Liang 提交于
For X86, the var2_w field of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction latency. Current perf forces the var2_w to the data->ins_lat in the generic code. It works well for now because X86 is the only architecture that supports the PERF_SAMPLE_WEIGHT_STRUCT, but it may bring problems once other architectures support the sample type. For example, the var2_w may be used to capture something else on PowerPC. Create two architecture specific functions to parse and synthesize the weight related samples. Move the X86 specific codes to the X86 version functions. Other architectures can implement their own functions later separately. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/1612540912-6562-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Adrian Hunter 提交于
Emitting a PSB+ can cause a CPU a slight delay. When doing timing analysis of code with Intel PT, it is useful to know if a timing bubble was caused by Intel PT or not. Add reporting of PSB events via perf script. PSB events are printed with the existing itrace 'p' option which also prints power and frequency changes. The PSB event contains the trace offset at which the PSB occurs, to allow easy reference back to the PSB+ packets. The PSB event timestamp is always the timestamp from the PSB+ TSC packet, and the ip is always the address from the PSB+ FUP packet. The code changes are non-trivial because the decoder must walk to the PSB+ FUP address before outputting the PSB event. Example: $ perf record -e intel_pt/cyc,psb_period=0/u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.046 MB perf.data ] $ perf script --itrace=p --ns perf 17981 [006] 25617.510820383: psb: psb offs: 0 0 [unknown] ([unknown]) perf 17981 [006] 25617.510820383: cbr: cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) uname 17981 [006] 25617.510889753: psb: psb offs: 0xb50 7f78c12a212e __GI___tunables_init+0xee (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510899162: psb: psb offs: 0x12d0 7f78c128af1c dl_main+0x93c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510939242: psb: psb offs: 0x1a50 7f78c128eefc _dl_map_object_from_fd+0x13c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510981274: psb: psb offs: 0x21c8 7f78c1296307 _dl_relocate_object+0x927 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510993034: psb: psb offs: 0x2948 7f78c12940e4 _dl_lookup_symbol_x+0x14 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511003871: psb: psb offs: 0x30c8 7f78c12937b3 do_lookup_x+0x2f3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511019854: psb: psb offs: 0x3850 7f78c1295eed _dl_relocate_object+0x50d (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511029015: psb: psb offs: 0x4390 7f78c12a855a strcmp+0xf6a (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511064876: psb: psb offs: 0x4b10 0 [unknown] ([unknown]) uname 17981 [006] 25617.511080762: psb: psb offs: 0x5290 7f78c11db53d _dl_addr+0x13d (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511086035: psb: psb offs: 0x5a08 7f78c11db538 _dl_addr+0x138 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511091381: psb: psb offs: 0x6190 7f78c11db534 _dl_addr+0x134 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511096681: psb: psb offs: 0x6910 7f78c11db4c3 _dl_addr+0xc3 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511119520: psb: psb offs: 0x7090 7f78c10ada5e _nl_intern_locale_data+0x12e (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511126584: psb: psb offs: 0x7818 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511132775: psb: psb offs: 0x8358 7f78c10c20c0 getenv+0xa0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511134598: psb: psb offs: 0x8ad0 7f78c10ada09 _nl_intern_locale_data+0xd9 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511135685: psb: psb offs: 0x9258 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511138322: psb: psb offs: 0x99d0 7f78c11fffd9 __strncmp_avx2+0x39 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511158907: psb: psb offs: 0xa150 0 [unknown] ([unknown]) Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-5-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Adrian Hunter 提交于
The code assumed every CYC-eligible packet has a CYC packet, which is not the case when CYC thresholds are used. Fix by checking if a CYC packet is actually present in that case. Fixes: 5b1dc0fd ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Adrian Hunter 提交于
The code assumed a change in cycle count means accurate IPC. That is not correct, for example when sampling both branches and instructions, or at a FUP packet (which is not CYC-eligible) address. Fix by using an explicit flag to indicate when IPC can be sampled. Fixes: 5b1dc0fd ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20210205175350.23817-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Adrian Hunter 提交于
Add missing CYC packet processing when walking through PSB+. This improves the accuracy of timestamps that follow PSB+, until the next MTC. Fixes: 3d498078 ("perf tools: Add new Intel PT packet definitions") Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Dave Rigby 提交于
When locating the DWARF module for a given address, __find_debuginfo() requires a 'struct dso' passed via the userdata argument. However, this field is only set in __report_module() if the module is found in via dwfl_addrmodule(), not if it is found later via dwfl_report_elf(). Set userdata irrespective of how the DWARF module was found, as long as we found a module. Fixes: bf53fc6b ("perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder") Signed-off-by: NDave Rigby <d.rigby@me.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211801Acked-by: NJan Kratochvil <jan.kratochvil@redhat.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/linux-perf-users/20210218165654.36604-1-d.rigby@me.com/Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Yang Jihong 提交于
Commit da231338 ("perf record: Use an eventfd to wakeup when done") uses eventfd() to solve a rare race where the setting and checking of 'done' which add done_fd to pollfd. When draining buffer, revents of done_fd is 0 and evlist__filter_pollfd function returns a non-zero value. As a result, perf record does not stop profiling. The following simple scenarios can trigger this condition: # sleep 10 & # perf record -p $! After the sleep process exits, perf record should stop profiling and exit. However, perf record keeps running. If pollfd revents contains only POLLERR or POLLHUP, perf record indicates that buffer is draining and need to stop profiling. Use fdarray_flag__nonfilterable() to set done eventfd to nonfilterable objects, so that evlist__filter_pollfd() does not filter and check done eventfd. Fixes: da231338 ("perf record: Use an eventfd to wakeup when done") Signed-off-by: NYang Jihong <yangjihong1@huawei.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Tested-by: NJiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: zhangjinhao2@huawei.com Link: http://lore.kernel.org/lkml/20210205065001.23252-1-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 18 2月, 2021 3 次提交
-
-
由 Jiapeng Chong 提交于
Fix the following coccicheck warnings: ./tools/perf/util/header.c:3809:18-20: WARNING !A || A && B is equivalent to !A || B. Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/1612497255-87189-1-git-send-email-jiapeng.chong@linux.alibaba.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Yang Li 提交于
Eliminate the following coccicheck warning: ./tools/perf/util/metricgroup.c:382:3-4: Unneeded semicolon Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Signed-off-by: NYang Li <yang.lee@linux.alibaba.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: yang li <yang.lee@linux.alibaba.com> Link: http://lore.kernel.org/lkml/1612165277-95878-1-git-send-email-yang.lee@linux.alibaba.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Fabian Hemmer 提交于
Detect symbols generated by the OCaml compiler based on their prefix. Demangle OCaml symbols, returning a newly allocated string (like the existing Java demangling functionality). Move a helper function (hex) from tests/code-reading.c to util/string.c To test: echo 'Printf.printf "%d\n" (Random.int 42)' > test.ml perf record ocamlopt.opt test.ml perf report -d ocamlopt.opt Signed-off-by: NFabian Hemmer <copy@copy.sh> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> LPU-Reference: 20210203211537.b25ytjb6dq5jfbwx@nyu Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 17 2月, 2021 1 次提交
-
-
由 Jiri Slaby 提交于
With LTO, there are symbols like these: /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug 10305: 0000000000955fa4 0 NOTYPE LOCAL DEFAULT 29 Predicate.cpp.2bc410e7 This comes from a runtime/debug split done by the standard way: objcopy --only-keep-debug $runtime $debug objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line --strip-all $runtime perf currently cannot resolve such symbols (relicts of LTO), as section 29 exists only in the debug file (29 is .debug_info). And perf resolves symbols only against runtime file. This results in all symbols from such a library being unresolved: 0.38% main2 libantlr4-runtime.so.4.8 [.] 0x00000000000671e0 So try resolving against the debug file first. And only if it fails (the section has NOBITS set), try runtime file. We can do this, as "objcopy --only-keep-debug" per documentation preserves all sections, but clears data of some of them (the runtime ones) and marks them as NOBITS. The correct result is now: 0.38% main2 libantlr4-runtime.so.4.8 [.] antlr4::IntStream::~IntStream Note that these LTO symbols are properly skipped anyway as they belong neither to *text* nor to *data* (is_label && !elf_sec__filter(&shdr, secstrs) is true). Signed-off-by: NJiri Slaby <jslaby@suse.cz> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210217122125.26416-1-jslaby@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 16 2月, 2021 3 次提交
-
-
由 Leo Yan 提交于
The sample structure contains the field 'data_src' which is used to tell the data operation attributions, e.g. operation type is loading or storing, cache level, it's snooping or remote accessing, etc. At the end, the 'data_src' will be parsed by perf mem/c2c tools to display human readable strings. This patch is to fill the 'data_src' field in the synthesized samples base on different types. Currently perf tool can display statistics for L1/L2/L3 caches but it doesn't support the 'last level cache'. To fit to current implementation, 'data_src' field uses L3 cache for last level cache. Before this commit, perf mem report looks like this: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 81.56% 61945 0 N/A [.] 0x00000000000009d8 serial_c [.] 0000000000000000 [unknown] N/A N/A 18.44% 14003 0 N/A [.] 0x0000000000000828 serial_c [.] 0000000000000000 [unknown] N/A N/A Now on a system with Arm SPE, addresses and access types are displayed: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 0.43% 324 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794e00 anon N/A Walker hit 0.42% 322 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794580 anon N/A Walker hit Signed-off-by: NLeo Yan <leo.yan@linaro.org> Reviewed-by: NJames Clark <james.clark@arm.com> Tested-by: NJames Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NJames Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-6-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Leo Yan 提交于
The memory event can deliver two benefits: - The first benefit is the memory event can give out global view for memory accessing, rather than organizing events with scatter mode (e.g. uses separate event for L1 cache, last level cache, etc) which which can only display a event for single memory type, memory events include all memory accessing so it can display the data accessing cross memory levels in the same view; - The second benefit is the sample generation might introduce a big overhead and need to wait for long time for Perf reporting, we can specify itrace option '--itrace=M' to filter out other events and only output memory events, this can significantly reduce the overhead caused by generating samples. This patch is to enable memory event for Arm SPE. Signed-off-by: NLeo Yan <leo.yan@linaro.org> Reviewed-by: NJames Clark <james.clark@arm.com> Tested-by: NJames Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NJames Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-5-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Leo Yan 提交于
To properly handle memory and branch samples, this patch divides into two functions for generating samples: arm_spe__synth_mem_sample() is for synthesizing memory and TLB samples; arm_spe__synth_branch_sample() is to synthesize branch samples. Arm SPE backend decoder has passed virtual and physical address through packets, the address info is stored into the synthesize samples in the function arm_spe__synth_mem_sample(). Committer notes: Fixed this: 36 46.77 fedora:27 : FAIL clang version 5.0.2 (tags/RELEASE_502/final) util/arm-spe.c:269:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; ^ util/arm-spe.c:288:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; By using = { .ip = 0, }; Signed-off-by: NLeo Yan <leo.yan@linaro.org> Reviewed-by: NJames Clark <james.clark@arm.com> Tested-by: NJames Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NJames Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-4-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 2月, 2021 8 次提交
-
-
由 Jianlin Lv 提交于
Perf failed to add a kretprobe event with debuginfo of vmlinux which is compiled by gcc with -fpatchable-function-entry option enabled. The same issue with kernel module. Issue: # perf probe -v 'kernel_clone%return $retval' ...... Writing event: r:probe/kernel_clone__return _text+599624 $retval Failed to write event: Invalid argument Error: Failed to add events. Reason: Invalid argument (Code: -22) # cat /sys/kernel/debug/tracing/error_log [156.75] trace_kprobe: error: Retprobe address must be an function entry Command: r:probe/kernel_clone__return _text+599624 $retval ^ # llvm-dwarfdump vmlinux |grep -A 10 -w 0x00df2c2b 0x00df2c2b: DW_TAG_subprogram DW_AT_external (true) DW_AT_name ("kernel_clone") DW_AT_decl_file ("/home/code/linux-next/kernel/fork.c") DW_AT_decl_line (2423) DW_AT_decl_column (0x07) DW_AT_prototyped (true) DW_AT_type (0x00dcd492 "pid_t") DW_AT_low_pc (0xffff800010092648) DW_AT_high_pc (0xffff800010092b9c) DW_AT_frame_base (DW_OP_call_frame_cfa) # cat /proc/kallsyms |grep kernel_clone ffff800010092640 T kernel_clone # readelf -s vmlinux |grep -i kernel_clone 183173: ffff800010092640 1372 FUNC GLOBAL DEFAULT 2 kernel_clone # objdump -d vmlinux |grep -A 10 -w \<kernel_clone\>: ffff800010092640 <kernel_clone>: ffff800010092640: d503201f nop ffff800010092644: d503201f nop ffff800010092648: d503233f paciasp ffff80001009264c: a9b87bfd stp x29, x30, [sp, #-128]! ffff800010092650: 910003fd mov x29, sp ffff800010092654: a90153f3 stp x19, x20, [sp, #16] The entry address of kernel_clone converted by debuginfo is _text+599624 (0x92648), which is consistent with the value of DW_AT_low_pc attribute. But the symbolic address of kernel_clone from /proc/kallsyms is ffff800010092640. This issue is found on arm64, -fpatchable-function-entry=2 is enabled when CONFIG_DYNAMIC_FTRACE_WITH_REGS=y; Just as objdump displayed the assembler contents of kernel_clone, GCC generate 2 NOPs at the beginning of each function. kprobe_on_func_entry detects that (_text+599624) is not the entry address of the function, which leads to the failure of adding kretprobe event. kprobe_on_func_entry ->_kprobe_addr ->kallsyms_lookup_size_offset ->arch_kprobe_on_func_entry // FALSE The cause of the issue is that the first instruction in the compile unit indicated by DW_AT_low_pc does not include NOPs. This issue exists in all gcc versions that support -fpatchable-function-entry option. I have reported it to the GCC community: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776 Currently arm64 and PA-RISC may enable fpatchable-function-entry option. The kernel compiled with clang does not have this issue. FIX: This GCC issue only cause the registration failure of the kretprobe event which doesn't need debuginfo. So, stop using debuginfo for retprobe. map will be used to query the probe function address. Signed-off-by: NJianlin Lv <Jianlin.Lv@arm.com> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: http://lore.kernel.org/lkml/20210210062646.2377995-1-Jianlin.Lv@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Nicholas Fraser 提交于
The first time dso__load() was called on a PE file it always returned -1 error. This caused the first call to map__find_symbol() to always fail on a PE file so the first sample from each PE file always had symbol <unknown>. Subsequent samples succeed however because the DSO is already loaded. This fixes dso__load() to return 0 when successfully loading a DSO with libbfd. Fixes: eac9a434 ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: NNicholas Fraser <nfraser@codeweavers.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/1671b43b-09c3-1911-dbf8-7f030242fbf7@codeweavers.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Nicholas Fraser 提交于
dso__load_bfd_symbols() attempts to load a DSO at its original path, then closes it and loads the file in the debug cache. This is incorrect. It should ignore the original file and work with only the debug cache. The original file may have changed or may not even exist, for example if the debug cache has been transferred to another machine via "perf archive". This fix makes it only load the file in the debug cache. Further notes from Nicholas: dso__load_bfd_symbols() is called in a loop from dso__load() for a variety of paths. These are generated by the various DSO_BINARY_TYPEs in the binary_type_symtab list at the top of util/symbol.c. In each case the debugfile passed to dso__load_bfd_symbols() is the path to try. One of those iterations (the first one I believe) passes the original path as the debugfile. If the file still exists at the original path, this is the one that ends up being used in case the debugcache was deleted or the PE file doesn't have a build-id. A later iteration (BUILD_ID_CACHE) passes debugfile as the file in the debugcache if it has a build-id. Even if the file was previously loaded at its original path, (if I understand correctly) this load will override it so the debugcache file ends up being used. Committer notes: So if it fails to find in the cache, it will eventually hope for the best and look at the path in the local filesystem, which in many cases is enough. At some point we need to switch from this "hope for the best" approach to one that warns the user that there is no guarantee, if no buildid is present, that just by looking at the pathname the symbolisation will work. Signed-off-by: NNicholas Fraser <nfraser@codeweavers.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/e58e1237-94ab-e1c9-a7b9-473531906954@codeweavers.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Leo Yan 提交于
This patch is to store operation type in packet structure. Signed-off-by: NLeo Yan <leo.yan@linaro.org> Reviewed-by: NJames Clark <james.clark@arm.com> Tested-by: NJames Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NJames Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-3-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Leo Yan 提交于
This patch is to store virtual and physical memory addresses in packet, which will be used for memory samples. Signed-off-by: NLeo Yan <leo.yan@linaro.org> Reviewed-by: NJames Clark <james.clark@arm.com> Tested-by: NJames Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210211133856.2137-2-james.clark@arm.comSigned-off-by: NJames Clark <james.clark@arm.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Leo Yan 提交于
This patch is to enable sample type PERF_SAMPLE_DATA_SRC for Arm SPE in the perf data, when output the tracing data, it tells tools that it contains data source in the memory event. Signed-off-by: NLeo Yan <leo.yan@linaro.org> Reviewed-by: NJames Clark <james.clark@arm.com> Tested-by: NJames Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210211133856.2137-1-james.clark@arm.comSigned-off-by: NJames Clark <james.clark@arm.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Ian Rogers 提交于
Minor cleanup. Signed-off-by: NIan Rogers <irogers@google.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210211183914.4093187-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Ian Rogers 提交于
Migrated to libperf in: 4b247fa7 ("libperf: Adopt xyarray class from perf") Signed-off-by: NIan Rogers <irogers@google.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210212043803.365993-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 12 2月, 2021 2 次提交
-
-
由 Dmitry Safonov 提交于
GCC (GCC) 8.4.0 20200304 fails to build perf with: : util/symbol.c: In function 'dso__load_bfd_symbols': : util/symbol.c:1626:16: error: comparison of integer expressions of different signednes : for (i = 0; i < symbols_count; ++i) { : ^ : util/symbol.c:1632:16: error: comparison of integer expressions of different signednes : while (i + 1 < symbols_count && : ^ : util/symbol.c:1637:13: error: comparison of integer expressions of different signednes : if (i + 1 < symbols_count && : ^ : cc1: all warnings being treated as errors It's unlikely that the symtable will be that big, but the fix is an oneliner and as perf has CORE_CFLAGS += -Wextra, which makes build to fail together with CORE_CFLAGS += -Werror Fixes: eac9a434 ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: NDmitry Safonov <dima@arista.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Jacek Caban <jacek@codeweavers.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Link: http://lore.kernel.org/lkml/20210209145148.178702-1-dima@arista.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Martin Liška 提交于
Considering the following testcase: int foo(int a, int b) { for (unsigned i = 0; i < 1000000000; i++) a += b; return a; } int main() { foo (3, 4); return 0; } 'perf annotate' displays: 86.52 │40055e: → ja 40056c <foo(int, int)+0x26> 13.37 │400560: mov -0x18(%rbp),%eax │400563: add %eax,-0x14(%rbp) │400566: addl $0x1,-0x4(%rbp) 0.11 │40056a: → jmp 400557 <foo(int, int)+0x11> │40056c: mov -0x14(%rbp),%eax │40056f: pop %rbp and the 'ja 40056c' does not link to the location in the function. It's caused by fact that comma is wrongly parsed, it's part of function signature. With my patch I see: 86.52 │ ┌──ja 26 13.37 │ │ mov -0x18(%rbp),%eax │ │ add %eax,-0x14(%rbp) │ │ addl $0x1,-0x4(%rbp) 0.11 │ │↑ jmp 11 │26:└─→mov -0x14(%rbp),%eax and 'o' output prints: 86.52 │4005┌── ↓ ja 40056c <foo(int, int)+0x26> 13.37 │4005│0: mov -0x18(%rbp),%eax │4005│3: add %eax,-0x14(%rbp) │4005│6: addl $0x1,-0x4(%rbp) 0.11 │4005│a: ↑ jmp 400557 <foo(int, int)+0x11> │4005└─→ mov -0x14(%rbp),%eax On the contrary, compiling the very same file with gcc -x c, the parsing is fine because function arguments are not displayed: jmp 400543 <foo+0x1d> Committer testing: Before: $ cat cpp_args_annotate.c int foo(int a, int b) { for (unsigned i = 0; i < 1000000000; i++) a += b; return a; } int main() { foo (3, 4); return 0; } $ gcc --version |& head -1 gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) $ gcc -g cpp_args_annotate.c -o cpp_args_annotate $ perf record ./cpp_args_annotate [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.275 MB perf.data (7188 samples) ] $ perf annotate --stdio2 foo Samples: 7K of event 'cycles:u', 4000 Hz, Event count (approx.): 7468429289, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo>: foo(): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) ↓ jmp 1d a += b; 13.45 13: mov -0x18(%rbp),%eax add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.09 1d: cmpl $0x3b9ac9ff,-0x4(%rbp) 86.46 ↑ jbe 13 return a; mov -0x14(%rbp),%eax } pop %rbp ← retq $ I.e. works for C, now lets switch to C++: $ g++ -g cpp_args_annotate.c -o cpp_args_annotate $ perf record ./cpp_args_annotate [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.268 MB perf.data (6976 samples) ] $ perf annotate --stdio2 foo Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo(int, int)>: foo(int, int): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) cmpl $0x3b9ac9ff,-0x4(%rbp) 86.53 → ja 40112c <foo(int, int)+0x26> a += b; 13.32 mov -0x18(%rbp),%eax 0.00 add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.15 → jmp 401117 <foo(int, int)+0x11> return a; mov -0x14(%rbp),%eax } pop %rbp ← retq $ Reproduced. Now with this patch: Reusing the C++ built binary, as we can see here: $ readelf -wi cpp_args_annotate | grep producer <c> DW_AT_producer : (indirect string, offset: 0x2e): GNU C++14 10.2.1 20201125 (Red Hat 10.2.1-9) -mtune=generic -march=x86-64 -g $ And furthermore: $ file cpp_args_annotate cpp_args_annotate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4fe3cab260204765605ec630d0dc7a7e93c361a9, for GNU/Linux 3.2.0, with debug_info, not stripped $ perf buildid-list -i cpp_args_annotate 4fe3cab260204765605ec630d0dc7a7e93c361a9 $ perf buildid-list | grep cpp_args_annotate 4fe3cab260204765605ec630d0dc7a7e93c361a9 /home/acme/c/cpp_args_annotate $ It now works: $ perf annotate --stdio2 foo Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo(int, int)>: foo(int, int): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) 11: cmpl $0x3b9ac9ff,-0x4(%rbp) 86.53 ↓ ja 26 a += b; 13.32 mov -0x18(%rbp),%eax 0.00 add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.15 ↑ jmp 11 return a; 26: mov -0x14(%rbp),%eax } pop %rbp ← retq $ Signed-off-by: NMartin Liška <mliska@suse.cz> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Link: http://lore.kernel.org/lkml/13e1a405-edf9-e4c2-4327-a9b454353730@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 11 2月, 2021 1 次提交
-
-
由 Kees Cook 提交于
As started by commit 05a5f51c ("Documentation: Replace lkml.org links with lore"), replace lkml.org links with lore to better use a single source that's more likely to stay available long-term. Signed-off-by: NKees Kook <keescook@chromium.org> Acked-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Kees Kook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: http://lore.kernel.org/lkml/20210210234220.2401035-1-keescook@chromium.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 2月, 2021 10 次提交
-
-
由 Jin Yao 提交于
'perf script' supports '-S' or '--symbol' options to only list the records for these symbols. A symbol is typically a name or hex address. If it's hex address, it is the start address of one symbol. While it would be useful if we can filter trace records by any hex address (not only the start address of symbol). So now we support filtering trace records by more conditions, such as: - symbol name - start address of symbol - any hexadecimal address - address range The comparison order is defined as: 1. symbol name comparison 2. symbol start address comparison. 3. any hexadecimal address comparison. 4. address range comparison. The idea is if we can get a valid address from -S list, we add the address to addr_list for address comparison otherwise we still leave it to sym_list for symbol comparison. Some examples: root@kbl-ppc:~# ./perf script -S ffffffff9a477308 perf 8562 [000] 347303.578858: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [000] 347303.578860: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [000] 347303.578861: 11 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [001] 347303.578903: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [001] 347303.578905: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [001] 347303.578906: 15 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [002] 347303.578952: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) perf 8562 [002] 347303.578953: 1 cycles: ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms]) Filter the traced records by hex address ffffffff9a477308. root@kbl-ppc:~# ./perf script -S ffffffff9a4dd4ce,ffffffff9a4d2de9,ffffffff9a6bf9f4 perf 8562 [001] 347303.578911: 311706 cycles: ffffffff9a6bf9f4 __kmalloc_node+0x204 ([kernel.kallsyms]) perf 8562 [002] 347303.578960: 354477 cycles: ffffffff9a4d2de9 sched_setaffinity+0x49 ([kernel.kallsyms]) perf 8562 [003] 347303.579015: 450958 cycles: ffffffff9a4dd4ce dequeue_task_fair+0x1ae ([kernel.kallsyms]) Filter the traced records by hex address ffffffff9a4dd4ce, ffffffff9a4d2de9, ffffffff9a6bf9f4. root@kbl-ppc:~# ./perf script -S ffffffff9a477309 --addr-range 16 perf 8562 [000] 347303.578863: 291 cycles: ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms]) perf 8562 [001] 347303.578907: 411 cycles: ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms]) perf 8562 [002] 347303.578956: 462 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [003] 347303.579010: 497 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [004] 347303.579059: 429 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [005] 347303.579109: 408 cycles: ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms]) perf 8562 [006] 347303.579159: 460 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) perf 8562 [007] 347303.579213: 436 cycles: ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms]) Filter the traced records from address range [ffffffff9a477309, ffffffff9a477309 + 15]. root@kbl-ppc:~# ./perf script -S "ffffffff9b163046,rcu_nmi_exit" perf 8562 [004] 347303.579060: 12013 cycles: ffffffff9b163046 exc_nmi+0x166 ([kernel.kallsyms]) perf 8562 [007] 347303.579214: 12138 cycles: ffffffff9b165944 rcu_nmi_exit+0x34 ([kernel.kallsyms]) Filter by address + symbol Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210207080935.31784-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jin Yao 提交于
This is to let intlist support addresses as its payload. One potential problem is it can't support negative number. But so far, there is no such kind of use case. Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210207080935.31784-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Paul Cercueil 提交于
ftw() has been obsolete for about 12 years now. Committer notes: Further notes provided by the patch author: "NOTE: Not runtime-tested, I have no idea what I need to do in perf to test this. But at least it compiles now with my uClibc-based toolchain." I looked at the nftw()/ftw() man page and for the use made with cgroups in 'perf stat' the end result is equivalent. Fixes: bb1c15b6 ("perf stat: Support regex pattern in --for-each-cgroup") Signed-off-by: NPaul Cercueil <paul@crapouillou.net> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: od@zcrc.me Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20210208181157.1324550-1-paul@crapouillou.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
The TMA method level 2 metrics is supported from the Intel Sapphire Rapids server, which expose four L2 Topdown metrics events to user space. There are eight L2 events in total. The other four L2 Topdown metrics events are calculated from the corresponding L1 and the exposed L2 events. Now, the --topdown prints the complete top-down metrics that supported by the CPU. For the Intel Sapphire Rapids server, there are 4 L1 events and 8 L2 events displyed in one line. Add a new option, --td-level, to display the top-down statistics that equal to or lower than the input level. The L2 event is marked only when both its L1 parent event and itself crosse the threshold. Here is an example: $ perf stat --topdown --td-level=2 --no-metric-only sleep 1 Topdown accuracy may decrease when measuring long periods. Please print the result regularly, e.g. -I1000 Performance counter stats for 'sleep 1': 16,734,390 slots 2,100,001 topdown-retiring # 12.6% retiring 2,034,376 topdown-bad-spec # 12.3% bad speculation 4,003,128 topdown-fe-bound # 24.1% frontend bound 328,125 topdown-heavy-ops # 2.0% heavy operations # 10.6% light operations 1,968,751 topdown-br-mispredict # 11.9% branch mispredict # 0.4% machine clears 2,953,127 topdown-fetch-lat # 17.8% fetch latency # 6.3% fetch bandwidth 5,906,255 topdown-mem-bound # 35.6% memory bound # 15.4% core bound Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-9-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
The instruction latency information can be recorded on some platforms, e.g., the Intel Sapphire Rapids server. With both memory latency (weight) and the new instruction latency information, users can easily locate the expensive load instructions, and also understand the time spent in different stages. The users can optimize their applications in different pipeline stages. The 'weight' field is shared among different architectures. Reusing the 'weight' field may impacts other architectures. Add a new field to store the instruction latency. Like the 'weight' support, introduce a 'ins_lat' for the global instruction latency, and a 'local_ins_lat' for the local instruction latency version. Add new sort functions, INSTR Latency and Local INSTR Latency, accordingly. Add local_ins_lat to the default_mem_sort_order[]. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the PERF_SAMPLE_WEIGHT sample type. Users can apply either the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but they cannot apply both sample types simultaneously. The new sample type shares the same space as the PERF_SAMPLE_WEIGHT sample type. The lower 32 bits are exactly the same for both sample type. The higher 32 bits may be different for different architecture. Add arch specific arch_evsel__set_sample_weight() to set the new sample type for X86. Only store the lower 32 bits for the sample->weight if the new sample type is applied. In practice, no memory access could last than 4G cycles. No data will be lost. If the kernel doesn't support the new sample type. Fall back to the PERF_SAMPLE_WEIGHT sample type. There is no impact for other architectures. Committer notes: Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core but not upstream yet. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
'perf c2c' is also a memory profiling tool. Apply the two new data source fields to 'perf c2c' as well. Extend 'perf c2c' to display the number of loads which blocked by data or address conflict. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
Two new data source fields, to indicate the block reasons of a load instruction, are introduced on the Intel Sapphire Rapids server. The fields can be used by the memory profiling. Add a new sort function, SORT_MEM_BLOCKED, for the two fields. For the previous platforms or the block reason is unknown, print "N/A" for the block reason. Add blocked as a default mem sort key for perf report and perf mem report. Committer testing: So in machines without this capability we get a "N/A" filling the new "Blocked" column: $ perf mem record ls arch certs CREDITS Documentation include ipc Kconfig lib MAINTAINERS mm samples security usr block COPYING crypto drivers fs init Kbuild kernel LICENSES Makefile net README scripts sound tools virt [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ] $ $ perf mem report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6 of event 'cpu/mem-loads,ldlat=30/Pu' # Total weight : 1381 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ .................... ....................... ............. ...................... ............ ..... ............ ...... ....... # 32.87% 1 454 Local RAM or RAM hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91cef3078 libc-2.31.so Hit L1 or L2 hit No N/A 25.56% 1 353 LFB or LFB hit [.] strcmp ld-2.31.so [.] 0x00005586973855ca ls None L1 or L2 hit No N/A 22.59% 1 312 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0e3b18 ld.so.cache None L1 or L2 hit No N/A 8.47% 1 117 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceee570 libc-2.31.so None L1 or L2 hit No N/A 6.88% 1 95 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceed490 libc-2.31.so None L1 or L2 hit No N/A 3.62% 1 50 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0ebe60 ld.so.cache None L1 or L2 hit No N/A # Samples: 11 of event 'cpu/mem-stores/Pu' # Total weight : 11 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ ............. ....................... ............. ...................... ........... ..... .......... ...... ....... # 9.09% 1 0 L1 hit [.] __strcoll_l libc-2.31.so [.] 0x00007fffe5648fc8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_lookup_symbol_x ld-2.31.so [.] 0x00007fffe56490b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_name_match_p ld-2.31.so [.] 0x00007fffe56487d8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_start ld-2.31.so [.] start_time+0x0 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_sysdep_start ld-2.31.so [.] 0x00007fffe56494b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5648ff8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649064 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649130 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xaf8 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xc28 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] 0x00007fffe56495b8 [stack] N/A N/A N/A N/A # (Tip: Show user configuration overrides: perf config --user --list) $ Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
On the Intel Sapphire Rapids server, an auxiliary event has to be enabled simultaneously with the load latency event to retrieve complete Memory Info. Add X86 specific perf_mem_events__name() to handle the auxiliary event. - Users are only interested in the samples of the mem-loads event. Sample read the auxiliary event. - The auxiliary event must be in front of the load latency event in a group. Assume the second event to sample if the auxiliary event is the leader. - Add a weak is_mem_loads_aux_event() to check the auxiliary event for X86. For other ARCHs, it always return false. Parse the unique event name, mem-loads-aux, for the auxiliary event. Committer notes: According to 61b985e3 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids"), ENODATA is only returned by sys_perf_event_open() when used with these auxiliary events, with this in evsel__open_strerror(): case ENODATA: return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. " "Please add an auxiliary event in front of the load latency event."); This is Ok at this point in time, but fragile long term, I pointed this out in the e-mail thread, requesting a follow up patch to check if ENODATA is really for this specific case. Fixed up sizeof(MEM_LOADS_AUX_NAME) bug pointed out by Namhyung. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210205152648.GC920417@kernel.org Link: http://lore.kernel.org/lkml/1612296553-21962-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jianlin Lv 提交于
if dwarf_offdie() returns NULL, the continue statement forces the next iteration of the loop without updating the 'off' variable. It will cause an endless loop in the process of traversing the compile unit. So add exception protection for looping CUs. Signed-off-by: NJianlin Lv <Jianlin.Lv@arm.com> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: jianlin.lv@arm.com Link: http://lore.kernel.org/lkml/20210203145702.1219509-1-Jianlin.Lv@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 2月, 2021 5 次提交
-
-
由 Ian Rogers 提交于
Avoid a naming conflict with for_each_event with similar code in parse-events.c, rename to for_each_event_tps. Signed-off-by: NIan Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210203052659.2975736-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Yonatan Goldschmidt 提交于
This patch fixes "perf inject --jit" to properly operate on namespaced/containerized processes: * jitdump files are generated by the process, thus they should be looked up in its mount NS. * DSOs of injected MMAP events will later be looked up in the process mount NS, so write them into its NS. * PIDs & TIDs from jitdump events need to be translated to the PID as seen by "perf record" before written into MMAP events. For a process in a different PID NS, the TID & PID given in the jitdump event are actually ignored; I use the TID & PID of the thread which mmap()ed the jitdump file. This is simplified and won't do for forks of the initial process, if they continue using the same jitdump file. Future patches might improve it. This was tested by recording a NodeJS process running with "--perf-prof", inside a Docker container, and by recording another NodeJS process running in the same namespaces as perf itself, to make sure it's not broken for non-containerized processes. Signed-off-by: NYonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Acked-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20201105015604.1726943-1-yonatan.goldschmidt@granulate.ioSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Yonatan Goldschmidt 提交于
Provides an accurate mean to determine if the owner thread is in a different PID namespace. Signed-off-by: NYonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Acked-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20201105015418.1725218-1-yonatan.goldschmidt@granulate.ioSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Namhyung Kim 提交于
Like in __event__synthesize_thread(), I think it's better to use scandir() instead of the readdir() loop. In case some malicious task continues to create new threads, the readdir() loop will run over and over to collect tids. The scandir() also has the problem but the window is much smaller since it doesn't do much work during the iteration. Also add filter_task() function as we only care the tasks. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210202090118.2008551-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Namhyung Kim 提交于
To synthesize information to resolve sample IPs, it needs to scan task and mmap info from the /proc filesystem. For each process, it opens (and reads) status and maps file respectively. But as kernel threads don't have memory maps so we can skip the maps file. To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file. It's about the peak virtual memory usage so only user-level tasks have that. Note that it's possible to miss the line due to partial reads. So we should double-check if it's a really kernel thread when there's no VmPeak line. Thus check "Threads:" line (which follows the VmPeak line whether or not it exists) to be sure it's read enough data - just in case of deeply nested pid namespaces or large number of supplementary groups are involved. This is for user process: $ head -40 /proc/1/status Name: systemd Umask: 0000 State: S (sleeping) Tgid: 1 Ngid: 0 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: NStgid: 1 NSpid: 1 NSpgid: 1 NSsid: 1 VmPeak: 234192 kB <-- here VmSize: 169964 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 29528 kB VmRSS: 6104 kB RssAnon: 2756 kB RssFile: 3348 kB RssShmem: 0 kB VmData: 19776 kB VmStk: 1036 kB VmExe: 784 kB VmLib: 9532 kB VmPTE: 116 kB VmSwap: 2400 kB HugetlbPages: 0 kB CoreDumping: 0 THP_enabled: 1 Threads: 1 <-- and here SigQ: 1/62808 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 7be3c0fe28014a03 SigIgn: 0000000000001000 And this is for kernel thread: $ head -20 /proc/2/status Name: kthreadd Umask: 0000 State: S (sleeping) Tgid: 2 Ngid: 0 Pid: 2 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: NStgid: 2 NSpid: 2 NSpgid: 0 NSsid: 0 Threads: 1 <-- here SigQ: 1/62808 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210202090118.2008551-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-