- 26 3月, 2021 1 次提交
-
-
由 Athira Rajeev 提交于
The pipeline stage cycles details can be recorded on powerpc from the contents of Performance Monitor Unit (PMU) registers. On ISA v3.1 platform, sampling registers exposes the cycles spent in different pipeline stages. Patch adds perf tools support to present two of the cycle counter information along with memory latency (weight). Re-use the field 'ins_lat' for storing the first pipeline stage cycle. This is stored in 'var2_w' field of 'perf_sample_weight'. Add a new field 'p_stage_cyc' to store the second pipeline stage cycle which is stored in 'var3_w' field of perf_sample_weight. Add new sort function 'Pipeline Stage Cycle' and include this in default_mem_sort_order[]. This new sort function may be used to denote some other pipeline stage in another architecture. So add this to list of sort entries that can have dynamic header string. Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: NMadhavan Srinivasan <maddy@linux.ibm.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 2月, 2021 2 次提交
-
-
由 Kan Liang 提交于
The instruction latency information can be recorded on some platforms, e.g., the Intel Sapphire Rapids server. With both memory latency (weight) and the new instruction latency information, users can easily locate the expensive load instructions, and also understand the time spent in different stages. The users can optimize their applications in different pipeline stages. The 'weight' field is shared among different architectures. Reusing the 'weight' field may impacts other architectures. Add a new field to store the instruction latency. Like the 'weight' support, introduce a 'ins_lat' for the global instruction latency, and a 'local_ins_lat' for the local instruction latency version. Add new sort functions, INSTR Latency and Local INSTR Latency, accordingly. Add local_ins_lat to the default_mem_sort_order[]. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
Two new data source fields, to indicate the block reasons of a load instruction, are introduced on the Intel Sapphire Rapids server. The fields can be used by the memory profiling. Add a new sort function, SORT_MEM_BLOCKED, for the two fields. For the previous platforms or the block reason is unknown, print "N/A" for the block reason. Add blocked as a default mem sort key for perf report and perf mem report. Committer testing: So in machines without this capability we get a "N/A" filling the new "Blocked" column: $ perf mem record ls arch certs CREDITS Documentation include ipc Kconfig lib MAINTAINERS mm samples security usr block COPYING crypto drivers fs init Kbuild kernel LICENSES Makefile net README scripts sound tools virt [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ] $ $ perf mem report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6 of event 'cpu/mem-loads,ldlat=30/Pu' # Total weight : 1381 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ .................... ....................... ............. ...................... ............ ..... ............ ...... ....... # 32.87% 1 454 Local RAM or RAM hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91cef3078 libc-2.31.so Hit L1 or L2 hit No N/A 25.56% 1 353 LFB or LFB hit [.] strcmp ld-2.31.so [.] 0x00005586973855ca ls None L1 or L2 hit No N/A 22.59% 1 312 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0e3b18 ld.so.cache None L1 or L2 hit No N/A 8.47% 1 117 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceee570 libc-2.31.so None L1 or L2 hit No N/A 6.88% 1 95 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceed490 libc-2.31.so None L1 or L2 hit No N/A 3.62% 1 50 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0ebe60 ld.so.cache None L1 or L2 hit No N/A # Samples: 11 of event 'cpu/mem-stores/Pu' # Total weight : 11 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ ............. ....................... ............. ...................... ........... ..... .......... ...... ....... # 9.09% 1 0 L1 hit [.] __strcoll_l libc-2.31.so [.] 0x00007fffe5648fc8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_lookup_symbol_x ld-2.31.so [.] 0x00007fffe56490b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_name_match_p ld-2.31.so [.] 0x00007fffe56487d8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_start ld-2.31.so [.] start_time+0x0 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_sysdep_start ld-2.31.so [.] 0x00007fffe56494b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5648ff8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649064 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649130 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xaf8 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xc28 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] 0x00007fffe56495b8 [stack] N/A N/A N/A N/A # (Tip: Show user configuration overrides: perf config --user --list) $ Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 21 1月, 2021 1 次提交
-
-
由 Stephane Eranian 提交于
Add a new sort dimension "code_page_size" for common sort. With this option applied, perf can sort and report by sample's code page size. For example: # perf report --stdio --sort=comm,symbol,code_page_size # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 3K of event 'mem-loads:uP' # Event count (approx.): 1470769 # # Overhead Command Symbol Code Page Size IPC [IPC Coverage] # ........ ....... ............................ .............. .................... # 69.56% dtlb [.] GetTickCount 4K - - 17.93% dtlb [.] Calibrate 4K - - 11.40% dtlb [.] __gettimeofday 4K - - # Signed-off-by: NStephane Eranian <eranian@google.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210105195752.43489-6-kan.liang@linux.intel.comSigned-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 12月, 2020 1 次提交
-
-
由 Kan Liang 提交于
Add a new sort option "data_page_size" for --mem-mode sort. With this option applied, perf can sort and report by sample's data page size. Here is an example: perf report --stdio --mem-mode --sort=comm,symbol,phys_daddr,data_page_size # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 9K of event 'mem-loads:uP' # Total weight : 9028 # Sort order : comm,symbol,phys_daddr,data_page_size # # Overhead Command Symbol Data Physical # Address # Data Page Size # ........ ....... ............................ # ...................... ...................... # 11.19% dtlb [.] touch_buffer [.] 0x00000003fec82ea8 4K 8.61% dtlb [.] GetTickCount [.] 0x00000003c4f2c8a8 4K 4.52% dtlb [.] GetTickCount [.] 0x00000003fec82f58 4K 4.33% dtlb [.] __gettimeofday [.] 0x00000003fec82f48 4K 4.32% dtlb [.] GetTickCount [.] 0x00000003fec82f78 4K 4.28% dtlb [.] GetTickCount [.] 0x00000003fec82f50 4K 4.23% dtlb [.] GetTickCount [.] 0x00000003fec82f70 4K 4.11% dtlb [.] GetTickCount [.] 0x00000003fec82f68 4K 4.00% dtlb [.] Calibrate [.] 0x00000003fec82f98 4K 3.91% dtlb [.] Calibrate [.] 0x00000003fec82f90 4K 3.43% dtlb [.] touch_buffer [.] 0x00000003fec82e98 4K 3.42% dtlb [.] touch_buffer [.] 0x00000003fec82e90 4K 0.09% dtlb [.] DoDependentLoads [.] 0x000000036ea084c0 2M 0.08% dtlb [.] DoDependentLoads [.] 0x000000032b010b80 2M Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 01 12月, 2020 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/, go on completing this split. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 28 5月, 2020 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 06 5月, 2020 3 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
As they are 'struct evsel' methods or related routines, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 18 4月, 2020 1 次提交
-
-
由 Kan Liang 提交于
Perf checks the duplicate entries in a callchain before adding an entry. However the check is very slow especially with deeper call stack. Almost ~50% elapsed time of perf report is spent on the check when the call stack is always depth of 32. The hist_entry__cmp() is used to compare the new entry with the old entries. It will go through all the available sorts in the sort_list, and call the specific cmp of each sort, which is very slow. Actually, for most cases, there are no duplicate entries in callchain. The symbols are usually different. It's much faster to do a quick check for symbols first. Only do the full cmp when the symbols are exactly the same. The quick check is only to check symbols, not dso. Export _sort__sym_cmp. $ perf record --call-graph lbr ./tchain_edit_64 Without the patch $time perf report --stdio real 0m21.142s user 0m21.110s sys 0m0.033s With the patch $time perf report --stdio real 0m10.977s user 0m10.948s sys 0m0.027s Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 03 4月, 2020 1 次提交
-
-
由 Namhyung Kim 提交于
The cgroup sort key is to show cgroup membership of each task. Currently it shows full path in the cgroupfs (not relative to the root of cgroup namespace) since it'd be more intuitive IMHO. Otherwise root cgroup in different namespaces will all show same name - "/". The cgroup sort key should come before cgroup_id otherwise sort_dimension__add() will match it to cgroup_id as it only matches with the given substring. For example it will look like following. Note that record patch adding --all-cgroups patch will come later. $ perf record -a --namespace --all-cgroups cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ] $ perf report -s cgroup_id,cgroup,pid ... # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 93.96% 0/0x0 / 0:swapper 1.25% 3/0xeffffffb / 278:looper0 0.86% 3/0xf000015f /sub/cgrp1 280:cgtest 0.37% 3/0xf0000160 /sub/cgrp2 281:cgtest 0.34% 3/0xf0000163 /sub/cgrp3 282:cgtest 0.22% 3/0xeffffffb /sub 278:looper0 0.20% 3/0xeffffffb / 280:cgtest 0.15% 3/0xf0000163 /sub/cgrp3 285:looper3 Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 10 3月, 2020 1 次提交
-
-
由 Kan Liang 提交于
The low level index of raw branch records for the most recent branch can be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX branch_sample_type. Extend struct branch_stack to support it. However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and entries[] will be output by kernel. The pointer of entries[] could be wrong, since the output format is different with new struct branch_stack. Add a variable no_hw_idx in struct perf_sample to indicate whether the hw_idx is output. Add get_branch_entry() to return corresponding pointer of entries[0]. To make dummy branch sample consistent as new branch sample, add hw_idx in struct dummy_branch_stack for cs-etm and intel-pt. Apply the new struct branch_stack for synthetic events as well. Extend test case sample-parsing to support new struct branch_stack. Committer notes: Renamed get_branch_entries() to perf_sample__branch_entries() to have proper namespacing and pave the way for this to be moved to libperf, eventually. Add 'static' to that inline as it is in a header. Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build on arm64. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 26 11月, 2019 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-61rra2wg392rhvdgw421wzpt@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-foo95pyyp3bhocbt7yd8qrvq@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 12 11月, 2019 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
And fill it whenever we setup a a 'struct map_symbol', now we need to use it, next cset. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fzwfcnddenz1o7uj1fzw3g46@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
So that we pass that substructure around and with it consolidate lots of functions that receive a (map, symbol) pair and now can receive just a 'struct map_symbol' pointer. This further paves the way to add 'struct map_groups' to 'struct map_symbol' so that we can have all we need for annotation so that we can ditch 'struct map'->groups, i.e. have the map_groups pointer in a more central place, avoiding the pointer in the 'struct map' that have tons of instances. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fs90ttd9q12l7989fo7pw81q@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 07 11月, 2019 4 次提交
-
-
由 Jin Yao 提交于
This patch provides helper routines to support new columns for block info output. The new columns are: Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object v5: --- 1. Move more block related functions from builtin-report.c to block-info.c 2. Set ms (map+sym) in block hist_entry. Because this info is needed for reporting the block range (i.e. source line) Committer notes: Remove unused set_fmt() function, some build were not completing with: util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function] static inline void set_fmt(struct block_fmt *block_fmt, ^ 1 error generated. Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jin Yao 提交于
We can get the per sample cycles by hist__account_cycles(). It's also useful to know the total cycles of all samples in order to get the cycles coverage for a single program block in further. For example: coverage = per block sampled cycles / total sampled cycles This patch creates a new argument 'total_cycles' in hist__account_cycles(), which will be added with the cycles of each sample. Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jin Yao 提交于
We have already implemented some block-info related functions. Now it's time to do some cleanup, refactoring and move the functions and structures to new block-info.h/block-info.c. v4: --- Move code for skipping column length calculation to patch: 'perf diff: Don't use hack to skip column length calculation' v3: --- 1. Rename the patch title 2. Rename from block.h/block.c to block-info.h/block-info.c 3. Move more common part to block-info, such as block_info__process_sym. 4. Remove the nasty hack for skipping calculation of column length Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jin Yao 提交于
Previously we use a nasty hack to skip the hists__calc_col_len for block since this function is not very suitable for block column length calculation. This patch removes the hack code and add a check at the entry of hists__calc_col_len to skip for block case. Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 05 11月, 2019 1 次提交
-
-
由 Jiri Olsa 提交于
The final sort might get confused when the comparison is done over bigger numbers than int like for -s time. Check the following report for longer workloads: $ perf report -s time -F time,overhead --stdio Fix hist_entry__sort() to properly return int64_t and not possible cut int. Fixes: 043ca389 ("perf tools: Use hpp formats to sort final output") Signed-off-by: NJiri Olsa <jolsa@kernel.org> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org # v3.16+ Link: http://lore.kernel.org/lkml/20191104232711.16055-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 01 9月, 2019 4 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
The mem_info struct goes to mem-events.h and branch_info goes to branch.h, where they belong, this way we can remove several headers from symbols.h and trim the include dependency tree more. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-aupw71xnravcsu2xoabfmhpc@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
Now that sort.h isn't included by any other header, we can check where it is really needed, i.e. we can remove it and be sure that it isn't being obtained indirectly. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-tom8k0lbsxd9joprr8zpu6w1@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
So that we can reduce the header dependency tree further, in the process noticed that lots of places were getting even things like build-id routines and 'struct perf_tool' definition indirectly, so fix all those too. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ti0btma9ow5ndrytyoqdk62j@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
All we need there is a forward declaration for 'union perf_event', so remove it from there and add missing header directives in places using things from this indirect include. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 30 8月, 2019 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
And fix the fallout, adding it to places that must have it since they use its definitions. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-1s3jel4i26chq2g0lydoz7i3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 8月, 2019 1 次提交
-
-
由 Namhyung Kim 提交于
The event group feature links relevant hist entries among events so that they can be displayed together. During the link process, each hist entry in non-leader events is connected to a hist entry in the leader event. This is done in order of events specified in the command line so it assumes that events are linked in the order. But 'perf top' can break the assumption since it does the link process multiple times. For example, a hist entry can be in the third event only at first so it's linked after the leader. Some time later, second event has a hist entry for it and it'll be linked after the entry of the third event. This makes the code compilicated to deal with such unordered entries. This patch simply unlink all the entries after it's printed so that they can assume the correct order after the repeated link process. Also it'd be easy to deal with decaying old entries IMHO. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190827231555.121411-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 26 8月, 2019 1 次提交
-
-
由 Andi Kleen 提交于
If the user specified --ns, the column to print the sort time stamp wasn't wide enough to actually print the full nanoseconds. Widen the time key column width when --ns is specified. Before: % perf record -a sleep 1 % perf report --sort time,overhead,symbol --stdio --ns ... 2.39% 187851.10000 [k] smp_call_function_single - - 1.53% 187851.10000 [k] intel_idle - - 0.59% 187851.10000 [.] __wcscmp_ifunc - - 0.33% 187851.10000 [.] 0000000000000000 - - 0.28% 187851.10000 [k] cpuidle_enter_state - - After: % perf report --sort time,overhead,symbol --stdio --ns ... 2.39% 187851.100000000 [k] smp_call_function_single - - 1.53% 187851.100000000 [k] intel_idle - - 0.59% 187851.100000000 [.] __wcscmp_ifunc - - 0.33% 187851.100000000 [.] 0000000000000000 - - 0.28% 187851.100000000 [k] cpuidle_enter_state - - Signed-off-by: NAndi Kleen <ak@linux.intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190823210338.12360-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 8月, 2019 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
When he have an event group we have multiple struct hist instances, one per evsel, and in each of these hists we may have hist_entries that point to the same thing being observed, say a symbol, i.e. if we're looking at instructions and cycles, then we'll have one hist_entry in the "instructions" evsel and another in the "cycles" evsel. We need to link those to then show one column for each. When we're looking at some other pair of events, say instructions and cache misses, we may have just the "instructions" hist entry and not one for "cache misses", as instructions not necessarily generate cache misses, as the logic expects one hist_entry per evsel, we end up adding "dummy" hist_entries. This is enough for 'perf report', that does this matching operation (hists__match()) just once after processing all events, but for 'perf top', we do this at each refresh, so we may finally find events matching and then we need to trow away the dummies and link with the real events. So if we find a match, traverse the link of matches and trow away dummies for that hists. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-dwvtjqqifsbsczeb35q6mqkk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
When we have multiple events in a group we link hist_entries in the non-leader evsel hists to the one in the leader that points to the same sorting criteria, in hists__match(). For 'perf report' we do this just once and then print the results, but for 'perf top' we need to look if this was already done in the previous refresh of the screen, so check for that and don't try to link again. This is part of having 'perf top' using the hists browser for showing multiple events in multiple columns. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-iwvb37rgb7upswhruwpcdnhw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 30 7月, 2019 4 次提交
-
-
由 Jiri Olsa 提交于
Move the nr_members member from perf's evsel to libperf's perf_evsel. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-60-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
Move the perf_event_attr struct fron 'struct evsel' to 'struct perf_evsel'. Committer notes: Fixed up these: tools/perf/arch/arm/util/auxtrace.c tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/arm-spe.c tools/perf/arch/s390/util/auxtrace.c tools/perf/util/cs-etm.c Also cc1: warnings being treated as errors tests/sample-parsing.c: In function 'do_test': tests/sample-parsing.c:162: error: missing initializer tests/sample-parsing.c:162: error: (near initialization for 'evsel.core.cpus') struct evsel evsel = { .needs_swap = false, - .core.attr = { - .sample_type = sample_type, - .read_format = read_format, + .core = { + . attr = { + .sample_type = sample_type, + .read_format = read_format, + }, [perfbuilder@a70e4eeb5549 /]$ gcc --version |& head -1 gcc (GCC) 4.4.7 Also we don't need to include perf_event.h in tools/perf/lib/include/perf/evsel.h, forward declaring 'struct perf_event_attr' is enough. And this even fixes the build in some systems where things are used somewhere down the include path from perf_event.h without defining __always_inline. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-43-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
Rename struct perf_evlist to struct evlist, so we don't have a name clash when we add struct perf_evlist in libperf. Committer notes: Added fixes to build on arm64, from Jiri and from me (tools/perf/util/cs-etm.c) Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
Rename struct perf_evsel to struct evsel, so we don't have a name clash when we add struct perf_evsel in libperf. Committer notes: Added fixes for arm64, provided by Jiri. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 7月, 2019 3 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
To allow for destructors to check if they're operating on a object still in a list, and to avoid going from use after free list entries into still valid, or even also other already removed from list entries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-deh17ub44atyox3j90e6rksu@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
In places where the equivalent was already being done, i.e.: free(a); a = NULL; And in placs where struct members are being freed so that if we have some erroneous reference to its struct, then accesses to freed members will result in segfaults, which we can detect faster than use after free to areas that may still have something seemingly valid. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 03 7月, 2019 1 次提交
-
-
由 Jin Yao 提交于
$ perf record -b ./div $ perf record -b ./div Following is the default perf diff output $ perf diff # Event 'cycles' # # Baseline Delta Abs Shared Object Symbol # ........ ......... ................ .................................. # 48.75% +0.33% div [.] main 8.21% -0.20% div [.] compute_flag 19.02% -0.12% libc-2.23.so [.] __random_r 16.17% -0.09% libc-2.23.so [.] __random 2.27% -0.03% div [.] rand@plt +0.02% [i915] [k] gen8_irq_handler 5.52% +0.02% libc-2.23.so [.] rand This patch creates a new computation selection 'cycles'. $ perf diff -c cycles # Event 'cycles' # # Baseline [Program Block Range] Cycles Diff Shared Object Symbol # ........ ....................................... ......................................... # 48.75% [div.c:42 -> div.c:45] 147 div [.] main 48.75% [div.c:31 -> div.c:40] 4 div [.] main 48.75% [div.c:40 -> div.c:40] 0 div [.] main 48.75% [div.c:42 -> div.c:42] 0 div [.] main 48.75% [div.c:42 -> div.c:44] 0 div [.] main 19.02% [random_r.c:357 -> random_r.c:360] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:373] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:376] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:380] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:392] 0 libc-2.23.so [.] __random_r 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:295] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:297] 0 libc-2.23.so [.] __random 16.17% [random.c:291 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:293 -> random.c:293] 0 libc-2.23.so [.] __random 8.21% [div.c:22 -> div.c:22] 148 div [.] compute_flag 8.21% [div.c:22 -> div.c:25] 0 div [.] compute_flag 8.21% [div.c:27 -> div.c:28] 0 div [.] compute_flag 5.52% [rand.c:26 -> rand.c:27] 0 libc-2.23.so [.] rand 5.52% [rand.c:26 -> rand.c:28] 0 libc-2.23.so [.] rand 2.27% [rand@plt+0 -> rand@plt+0] 0 div [.] rand@plt 0.01% [entry_64.S:694 -> entry_64.S:694] 16 [vmlinux] [k] native_irq_return_iret 0.00% [fair.c:7676 -> fair.c:7665] 162 [vmlinux] [k] update_blocked_averages "[Program Block Range]" indicates the range of program basic block (start -> end). If we can find the source line it prints the source line otherwise it prints the symbol+offset instead. v4: --- Use source lines or symbol+offset to indicate the basic block. It should be easier to understand. v3: --- Cast 'struct hist_entry' to 'struct block_hist' in hist_entry__block_fprintf. Use symbol_conf.report_block to check if executing hist_entry__block_fprintf. v2: --- Keep standard perf diff format and display the 'Baseline' and 'Shared Object'. The output is sorted by "Baseline" and the basic blocks in the same function are sorted by cycles diff. Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1561713784-30533-7-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-