- 11 8月, 2015 1 次提交
-
-
由 Andi Kleen 提交于
In some cases it's useful to characterize samples by file. This is useful to get a higher level categorization, for example to map cost to subsystems. Add a srcfile sort key to perf report. It builds on top of the existing srcline support. Commiter notes: E.g.: # perf record -F 10000 usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (13 samples) ] [root@zoo ~]# perf report -s srcfile --stdio # Total Lost Samples: 0 # # Samples: 13 of event 'cycles' # Event count (approx.): 869878 # # Overhead Source File # ........ ........... 60.99% . 20.62% paravirt.h 14.23% rmap.c 4.04% signal.c 0.11% msr.h # The first line is collecting all the files for which srcfiles couldn't somehow get resolved to: # perf report -s srcfile,dso --stdio # Total Lost Samples: 0 # # Samples: 13 of event 'cycles' # Event count (approx.): 869878 # # Overhead Source File Shared Object # ........ ........... ................ 40.97% . ld-2.20.so 20.62% paravirt.h [kernel.vmlinux] 20.02% . libc-2.20.so 14.23% rmap.c [kernel.vmlinux] 4.04% signal.c [kernel.vmlinux] 0.11% msr.h [kernel.vmlinux] # XXX: Investigate why that is not resolving on Fedora 21, Andi says he hasn't seen this on Fedora 22. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1438988064-21834-1-git-send-email-andi@firstfloor.org [ Added column length update, from 0e65bdb3f90f ('perf hists: Update the column width for the "srcline" sort key') ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 10 8月, 2015 1 次提交
-
-
由 Andi Kleen 提交于
For perf report/script srcline currently only the base file name of the source file is printed. This is a good default because it usually fits on the screen. But in some cases we want to know the full file name, for example to aggregate hits per file. In the later case we need more than the base file name to resolve file naming collisions: for example the kernel source has ~70 files named "core.c" It's also useful as input to post processing tools which want to point to the right file. Add a flag to allow full file name output. Add an option to perf report/script to enable this option. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1438986245-15191-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 07 8月, 2015 1 次提交
-
-
由 Andi Kleen 提交于
cycles is a new branch_info field available on some CPUs that indicates the time deltas between branches in the LBR. Add a sort key and output code for the cycles to allow to display the basic block cycles individually in perf report. We also pass in the cycles for weight when LBRs are processed, which allows to get global and local weight, to get an estimate of the total cost. And also print the cycles information for perf report -D. I also added printing for the previously missing LBR flags (mispredict etc.) Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1437233094-12844-2-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 12 5月, 2015 1 次提交
-
-
由 Namhyung Kim 提交于
The 'perf record -s' and 'perf report -T' should be used together to see per-thread event counts. Document the relation of these commands. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1431184784-30525-1-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 06 5月, 2015 1 次提交
-
-
由 Adrian Hunter 提交于
Add AUX area tracing option 'x' to synthesize events for transactions. This will be used by Intel PT to synthesize an event record for each TSX start, commit or abort. Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1430404667-10593-6-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 05 5月, 2015 1 次提交
-
-
由 Adrian Hunter 提交于
Unwittingly the itrace options for perf report ended up below the Overhead Calculation section. Move it back with the other options. Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1430404667-10593-2-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 5月, 2015 1 次提交
-
-
由 Adrian Hunter 提交于
Add support for decoding an AUX area assuming it contains instruction tracing data. Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1429903807-20559-4-git-send-email-adrian.hunter@intel.com [ Do not use -Z as an alternative to --itrace ] [ Fixed initialization of itrace_synth_opts struct fields on older gcc versions ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 4月, 2015 1 次提交
-
-
由 Namhyung Kim 提交于
As the --children option changes the output of perf report (and perf top) it sometimes confuses users. Add more words and examples to help understanding of the option's behavior - and how to disable it ;-). Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/1429684425-14987-1-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 25 3月, 2015 1 次提交
-
-
由 David Ahern 提交于
The 'record' and 'top' tools already allow a user to specify a CSV of pids and/or tids of tasks to collect data. Add those options to the 'report' and 'script' analysis commands to only consider samples related to the given pids/tids. This is also inline with the existing comm option. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1427212361-7066-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 02 12月, 2014 2 次提交
-
-
由 Andi Kleen 提交于
Add a --branch-history option to perf report that changes all the settings necessary for using the branches in callstacks. This is just a short cut to make this nicer to use, it does not enable any functionality by itself. v2: Change sort order. Rename option to --branch-history to be less confusing. v3: Updates v4: Fix conflict with newer perf base v5: Port to latest tip v6: Add more comments. Remove CCKEY_ADDRESS setting. Remove unnecessary branch_mode setting. Use a boolean. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1415844328-4884-5-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Currently branch stacks can be only shown as edge histograms for individual branches. I never found this display particularly useful. This implements an alternative mode that creates histograms over complete branch traces, instead of individual branches, similar to how normal callgraphs are handled. This is done by putting it in front of the normal callgraph and then using the normal callgraph histogram infrastructure to unify them. This way in complex functions we can understand the control flow that lead to a particular sample, and may even see some control flow in the caller for short functions. Example (simplified, of course for such simple code this is usually not needed), please run this after the whole patchkit is in, as at this point in the patch order there is no --branch-history, that will be added in a patch after this one: tcall.c: volatile a = 10000, b = 100000, c; __attribute__((noinline)) f2() { c = a / b; } __attribute__((noinline)) f1() { f2(); f2(); } main() { int i; for (i = 0; i < 1000000; i++) f1(); } % perf record -b -g ./tsrc/tcall [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ] % perf report --no-children --branch-history ... 54.91% tcall.c:6 [.] f2 tcall | |--65.53%-- f2 tcall.c:5 | | | |--70.83%-- f1 tcall.c:11 | | f1 tcall.c:10 | | main tcall.c:18 | | main tcall.c:18 | | main tcall.c:17 | | main tcall.c:17 | | f1 tcall.c:13 | | f1 tcall.c:13 | | f2 tcall.c:7 | | f2 tcall.c:5 | | f1 tcall.c:12 | | f1 tcall.c:12 | | f2 tcall.c:7 | | f2 tcall.c:5 | | f1 tcall.c:11 | | | --29.17%-- f1 tcall.c:12 | f1 tcall.c:12 | f2 tcall.c:7 | f2 tcall.c:5 | f1 tcall.c:11 | f1 tcall.c:10 | main tcall.c:18 | main tcall.c:18 | main tcall.c:17 | main tcall.c:17 | f1 tcall.c:13 | f1 tcall.c:13 | f2 tcall.c:7 | f2 tcall.c:5 | f1 tcall.c:12 The default output is unchanged. This is only implemented in perf report, no change to record or anywhere else. This adds the basic code to report: - add a new "branch" option to the -g option parser to enable this mode - when the flag is set include the LBR into the callstack in machine.c. The rest of the history code is unchanged and doesn't know the difference between LBR entry and normal call entry. - detect overlaps with the callchain - remove small loop duplicates in the LBR Current limitations: - The LBR flags (mispredict etc.) are not shown in the history and LBR entries have no special marker. - It would be nice if annotate marked the LBR entries somehow (e.g. with arrows) v2: Various fixes. v3: Merge further patches into this one. Fix white space. v4: Improve manpage. Address review feedback. v5: Rename functions. Better error message without -g. Fix crash without -b. v6: Rebase v7: Rebase. Use NO_ENTRY in memset. v8: Port to latest tip. Move add_callchain_ip to separate patch. Skip initial entries in callchain. Minor cleanups. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 18 9月, 2014 1 次提交
-
-
由 Avi Kivity 提交于
Some Linux symbols (for example __vt_event_wait) are interpreted by the demangler as C++ mangled names, which of course they aren't. Disable kernel symbol demangling by default to avoid this, and allow enabling it with a new option --demangle-kernel for those who wish it. Reported-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NAvi Kivity <avi@cloudius-systems.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1410581705-26968-1-git-send-email-avi@cloudius-systems.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 12 8月, 2014 1 次提交
-
-
由 Namhyung Kim 提交于
Add -w/--column-widths option like perf report does so that users are able to see symbols even with some very long C++ library/functions. It can be a list separated by comma for each column. $ perf top -w 0,20,30 The value of 0 means there's no limit. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1406785662-5534-6-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 6月, 2014 2 次提交
-
-
由 Don Zickus 提交于
In perf's 'mem-mode', one can get access to a whole bunch of details specific to a particular sample instruction. A bunch of those details relate to the data address. One interesting thing you can do with data addresses is to convert them into a unique cacheline they belong too. Organizing these data cachelines into similar groups and sorting them can reveal cache contention. This patch creates an alogorithm based on various sample details that can help group entries together into data cachelines and allows 'perf report' to sort on it. The algorithm relies on having proper mmap2 support in the kernel to help determine if the memory map the data address belongs to is private to a pid or globally shared. The alogortithm is as follows: o group cpumodes together o group entries with discovered maps together o sort on major, minor, inode and inode generation numbers o if userspace anon, then sort on pid o sort on cachelines based on data addresses The 'dcacheline' sort option in 'perf report' only works in 'mem-mode'. Sample output: # # Samples: 206 of event 'cpu/mem-loads/pp' # Total weight : 2534 # Sort order : dcacheline,pid # # Overhead Samples Data Cacheline Command: Pid # ........ ............ ...................................................................... .................. # 13.22% 1 [k] 0xffff88042f08ebc0 swapper: 0 9.27% 1 [k] 0xffff88082e8cea80 swapper: 0 3.59% 2 [k] 0xffffffff819ba180 swapper: 0 0.32% 1 [k] arch_trigger_all_cpu_backtrace_handler_na.23901+0xffffffffffffffe0 swapper: 0 0.32% 1 [k] timekeeper_seq+0xfffffffffffffff8 swapper: 0 Note: Added a '+1' to symlen size in hists__calc_col_len to prevent the next column from prematurely tabbing over and mis-aligning. Not sure what the problem is. Signed-off-by: NDon Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1401208087-181977-8-git-send-email-dzickus@redhat.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
-
由 Don Zickus 提交于
Add mem-mode sorting types and mem-mode itself to perf-report documentation. Signed-off-by: NDon Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1400526833-141779-5-git-send-email-dzickus@redhat.comSigned-off-by: NJiri Olsa <jolsa@kernel.org>
-
- 01 6月, 2014 1 次提交
-
-
由 Namhyung Kim 提交于
The --children option is for showing accumulated overhead (period) value as well as self overhead. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Tested-by: NArun Sharma <asharma@fb.com> Tested-by: NRodrigo Campos <rodrigo@sdfg.com.ar> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1401335910-16832-16-git-send-email-namhyung@kernel.orgSigned-off-by: NJiri Olsa <jolsa@kernel.org>
-
- 21 5月, 2014 2 次提交
-
-
由 Namhyung Kim 提交于
The -F/--fields option is to allow user setup output field in any order. It can receive any sort keys and following (hpp) fields: overhead, overhead_sys, overhead_us, sample and period If guest profiling is enabled, overhead_guest_{sys,us} will be available too. The output fields also affect sort order unless you give -s/--sort option. And any keys specified on -s option, will also be added to the output field list automatically. $ perf report -F sym,sample,overhead ... # Symbol Samples Overhead # .......................... ............ ........ # [.] __cxa_atexit 2 2.50% [.] __libc_csu_init 4 5.00% [.] __new_exitfn 3 3.75% [.] _dl_check_map_versions 1 1.25% [.] _dl_name_match_p 4 5.00% [.] _dl_setup_hash 1 1.25% [.] _dl_sysdep_start 1 1.25% [.] _init 5 6.25% [.] _setjmp 6 7.50% [.] a 8 10.00% [.] b 8 10.00% [.] brk 1 1.25% [.] c 8 10.00% Note that, the example output above is captured after applying next patch which fixes sort/comparing behavior. Requested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NIngo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1400480762-22852-12-git-send-email-namhyung@kernel.orgSigned-off-by: NJiri Olsa <jolsa@kernel.org>
-
由 Namhyung Kim 提交于
Add overhead{,_sys,_us,_guest_sys,_guest_us}, sample and period sort keys so that they can be selected with --sort/-s option. $ perf report -s period,comm --stdio ... # Overhead Period Command # ........ ............ ............... # 47.06% 152 swapper 13.93% 45 qemu-system-arm 12.38% 40 synergys 3.72% 12 firefox 2.48% 8 xchat Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NIngo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1400480762-22852-9-git-send-email-namhyung@kernel.orgSigned-off-by: NJiri Olsa <jolsa@kernel.org>
-
- 16 4月, 2014 1 次提交
-
-
由 Namhyung Kim 提交于
The --percentage option is for controlling overhead percentage displayed. It can only receive either of "relative" or "absolute". "relative" means it's relative to filtered entries only so that the sum of shown entries will be always 100%. "absolute" means it retains the original value before and after the filter is applied. $ perf report -s comm # Overhead Command # ........ ............ # 74.19% cc1 7.61% gcc 6.11% as 4.35% sh 4.14% make 1.13% fixdep ... $ perf report -s comm -c cc1,gcc --percentage absolute # Overhead Command # ........ ............ # 74.19% cc1 7.61% gcc $ perf report -s comm -c cc1,gcc --percentage relative # Overhead Command # ........ ............ # 90.69% cc1 9.31% gcc Note that it has zero effect if no filter was applied. Suggested-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1397145720-8063-3-git-send-email-namhyung@kernel.orgSigned-off-by: NJiri Olsa <jolsa@redhat.com>
-
- 11 12月, 2013 1 次提交
-
-
由 Jiri Olsa 提交于
Currently the perf.data header is always displayed for stdio output, which is no always useful. Disabling header information by default and adding following options to control header output: --header - display header information (old default) --header-only - display header information only w/o further processing, forces stdio output Signed-off-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1386583370-1699-2-git-send-email-jolsa@redhat.com [ Added single line explaining talking about the new --header* options, to address David Ahern comment; better man page entry for the new options, from Namhyung Kim ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 22 10月, 2013 1 次提交
-
-
由 Waiman Long 提交于
When callgraph data was included in the perf data file, it may take a long time to scan all those data and merge them together especially if the stored callchains are long and the perf data file itself is large, like a Gbyte or so. The callchain stack is currently limited to PERF_MAX_STACK_DEPTH (127). This is a large value. Usually the callgraph data that developers are most interested in are the first few levels, the rests are usually not looked at. This patch adds a new --max-stack option to perf-report to limit the depth of callchain stack data to look at to reduce the time it takes for perf-report to finish its processing. It trades the presence of trailing stack information with faster speed. The following table shows the elapsed time of doing perf-report on a perf.data file of size 985,531,828 bytes. --max_stack Elapsed Time Output data size ----------- ------------ ---------------- not set 88.0s 124,422,651 64 87.5s 116,303,213 32 87.2s 112,023,804 16 86.6s 94,326,380 8 59.9s 33,697,248 4 40.7s 10,116,637 -g none 27.1s 2,555,810 Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Scott J Norton <scott.norton@hp.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382107129-2010-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 10月, 2013 2 次提交
-
-
由 Andi Kleen 提交于
Add support for recording and displaying the transaction flags. They are essentially a new sort key. Also display them in a nice way to the user. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1379688044-14173-6-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Extend the perf branch sorting code to support sorting by in_tx or abort_tx qualifiers. Also print out those qualifiers. This also fixes up some of the existing sort key documentation. We do not support no_tx here, because it's simply not showing the in_tx flag. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1379688044-14173-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 7月, 2013 1 次提交
-
-
由 Andi Kleen 提交于
With programs with very large functions it can be useful to distinguish the callgraph nodes on more than just function names. So for example if you have multiple calls to the same function, it ends up being separate nodes in the chain. This patch adds a new key field to the callgraph options, that allows comparing nodes on functions (as today, default) and addresses. Longer term it would be nice to also handle src lines, but that would need more changes and address is a reasonable proxy for it today. I right now reference the global params, as there was no simple way to register a params pointer. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-0uskktybf0e7wrnoi5e9b9it@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 7月, 2013 1 次提交
-
-
由 Greg Price 提交于
For example, in an application with an expensive function implemented with deeply nested recursive calls, the default call-graph presentation is dominated by the different callchains within that function. By ignoring these callees, we can collect the callchains leading into the function and compactly identify what to blame for expensive calls. For example, in this report the callers of garbage_collect() are scattered across the tree: $ perf report -d ruby 2>- | grep -m10 ^[^#]*[a-z] 22.03% ruby [.] gc_mark --- gc_mark |--59.40%-- mark_keyvalue | st_foreach | gc_mark_children | |--99.75%-- rb_gc_mark | | rb_vm_mark | | gc_mark_children | | gc_marks | | |--99.00%-- garbage_collect If we ignore the callees of garbage_collect(), its callers are coalesced: $ perf report --ignore-callees garbage_collect -d ruby 2>- | grep -m10 ^[^#]*[a-z] 72.92% ruby [.] garbage_collect --- garbage_collect vm_xmalloc |--47.08%-- ruby_xmalloc | st_insert2 | rb_hash_aset | |--98.45%-- features_index_add | | rb_provide_feature | | rb_require_safe | | vm_call_method Signed-off-by: NGreg Price <price@mit.edu> Tested-by: NJiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130623031720.GW22203@biohazard-cafe.mit.edu Link: http://lkml.kernel.org/r/20130708115746.GO22203@biohazard-cafe.mit.edu Cc: Fengguang Wu <fengguang.wu@intel.com> [ remove spaces at beginning of line, reported by Fengguang Wu ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 28 5月, 2013 1 次提交
-
-
由 Namhyung Kim 提交于
The --percent-limit option is for not showing small overhead entries in the output. Maybe we want to set a certain default value like 0.1. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1368497347-9628-7-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 01 4月, 2013 1 次提交
-
-
由 Andi Kleen 提交于
perf record has a new option -W that enables weightened sampling. Add sorting support in top/report for the average weight per sample and the total weight sum. This allows to both compare relative cost per event and the total cost over the measurement period. Add the necessary glue to perf report, record and the library. v2: Merge with new hist refactoring. v3: Fix manpage. Remove value check. Rename global_weight to weight and weight to local_weight. v4: Readd sort keys to manpage v5: Move weight to end v6: Move weight to template v7: Rename weight key. Original patch from Andi modified by Stephane Eranian <eranian@google.com> to include ONLY the weight supporting code and apply to pristine 3.8.0-rc4. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1359040242-8269-6-git-send-email-eranian@google.com [ committer note: changed to cope with fc5871ed and the hists_link perf test entry ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 27 3月, 2013 1 次提交
-
-
由 Namhyung Kim 提交于
It's sometimes useful to see undemangled raw symbol name for example other tools using the perf output to do manipulation of binaries. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Suggested-by: NWilliam Cohen <wcohen@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: William Cohen <wcohen@redhat.com> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=55571 Link: http://lkml.kernel.org/r/1364203098-17741-1-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 01 2月, 2013 1 次提交
-
-
由 Namhyung Kim 提交于
Add --group option to enable event grouping. When enabled, all the group members information will be shown together with the leader. $ perf report --group ... # group: {ref-cycles,cycles} # ======== # # Samples: 7K of event 'anon group { ref-cycles, cycles }' # Event count (approx.): 6876107743 # # Overhead Command Shared Object Symbol # ................ ....... ................. .......................... # 99.84% 99.76% noploop noploop [.] main 0.07% 0.00% noploop ld-2.15.so [.] strcmp 0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del 0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu 0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time 0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask 0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe 0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock 0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page 0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks 0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1358845787-1350-18-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 25 1月, 2013 1 次提交
-
-
由 Namhyung Kim 提交于
Add description of sort keys to the perf-report document and also add missing cpu and srcline keys to the command line help string. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356599507-14226-11-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 06 9月, 2012 1 次提交
-
-
由 Maciek Borzecki 提交于
When analyzing perf data from hosts of other architecture than one of the local host it's useful to call objdump that is part of a toolchain for that architecture. Instead of calling regular objdump, call one that user specified in command line. Signed-off-by: NMaciek Borzecki <maciek.borzecki@gmail.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1346754750.16299.3.camel@localhost.localdomainSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 6月, 2012 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
Using addr2line for now, requires debuginfo, needs more work to support detached debuginfo, aka foo-debuginfo packages. Example: [root@sandy ~]# perf record -a sleep 3 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.555 MB perf.data (~24236 samples) ] [root@sandy ~]# perf report -s dso,srcline 2>&1 | grep -v ^# | head -5 22.41% [kernel.kallsyms] /home/git/linux/drivers/idle/intel_idle.c:280 4.79% [kernel.kallsyms] /home/git/linux/drivers/cpuidle/cpuidle.c:148 4.78% [kernel.kallsyms] /home/git/linux/arch/x86/include/asm/atomic64_64.h:121 4.49% [kernel.kallsyms] /home/git/linux/kernel/sched/core.c:1690 4.30% [kernel.kallsyms] /home/git/linux/include/linux/seqlock.h:90 [root@sandy ~]# [root@sandy ~]# perf top -U -s dso,symbol,srcline Samples: 1K of event 'cycles', Event count (approx.): 589617389 18.66% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:143 7.83% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:39 6.59% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:38 3.66% [kernel] [k] page_fault /home/git/linux/arch/x86/kernel/entry_64.S:1379 3.25% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:40 3.12% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:37 2.74% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:36 2.39% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:43 2.12% [kernel] [k] ioread32 /home/git/linux/lib/iomap.c:90 1.51% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:144 1.19% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:154 Suggested-by: NAndi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-pdmqbng9twz06jzkbgtuwbp8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 3月, 2012 1 次提交
-
-
由 Pekka Enberg 提交于
This patch adds a simple GTK2-based browser to 'perf report' that's based on the TTY-based browser in builtin-report.c. To launch "perf report" using the new GTK interface just type: $ perf report --gtk The interface is somewhat limited in features at the moment: - No callgraph support - No KVM guest profiling support - No color coding for percentages - No sorting from the UI - ..and many, many more! That said, I think this patch a reasonable start to build future features on. Signed-off-by: NPekka Enberg <penberg@kernel.org> Cc: Colin Walters <walters@verbum.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202231952410.6689@tux.localdomain [ committer note: Added #pragma to make gtk no strict prototype problem go away as suggested by Colin Walters modulo avoiding push/pop ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 19 3月, 2012 1 次提交
-
-
由 Namhyung Kim 提交于
Add missing description of --symbol-filter in Documentation/perf-report.txt. Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NNamhyung Kim <namhyung.kim@lge.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1332125628-23088-1-git-send-email-namhyung.kim@lge.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 3月, 2012 2 次提交
-
-
由 Stephane Eranian 提交于
This patch enhances perf report to auto-detect when the perf.data file contains samples with branch stacks. That way it is not necessary to use the -b option. To force branch view mode to off, simply use --no-branch-stack. Signed-off-by: NStephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1331246868-19905-4-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Roberto Agostino Vitillo 提交于
This patch adds support for taken branch sampling, i.e, the PERF_SAMPLE_BRANCH_STACK feature to perf report. In other words, to display histograms based on taken branches rather than executed instructions addresses. The new option is called -b and it takes no argument. To generate meaningful output, the perf.data must have been obtained using perf record -b xxx ... where xxx is a branch filter option. The output shows symbols, modules, sorted by 'who branches where' the most often. The percentages reported in the first column refer to the total number of branches captured and not the usual number of samples. Here is a quick example. Here branchy is simple test program which looks as follows: void f2(void) {} void f3(void) {} void f1(unsigned long n) { if (n & 1UL) f2(); else f3(); } int main(void) { unsigned long i; for (i=0; i < N; i++) f1(i); return 0; } Here is the output captured on Nehalem, if we are only interested in user level function calls. $ perf record -b any_call,u -e cycles:u branchy $ perf report -b --sort=symbol 52.34% [.] main [.] f1 24.04% [.] f1 [.] f3 23.60% [.] f1 [.] f2 0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow 0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn 0.01% [k] _IO_vfprintf_internal [k] strchrnul 0.01% [k] __printf [k] _IO_vfprintf_internal 0.01% [k] main [k] __printf About half (52%) of the call branches captured are from main() -> f1(). The second half (24%+23%) is split in two equal shares between f1() -> f2(), f1() ->f3(). The output is as expected given the code. It should be noted, that using -b in perf record does not eliminate information in the perf.data file. Consequently, a typical profile can also be obtained by perf report by simply not using its -b option. It is possible to sort on branch related columns: - dso_from, symbol_from - dso_to, symbol_to - mispredict Signed-off-by: NRoberto Agostino Vitillo <ravitillo@lbl.gov> Signed-off-by: NStephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-14-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 12月, 2011 1 次提交
-
-
由 Robert Richter 提交于
The default input file for perf report is not handled the same way as perf record does it for its output file. This leads to unexpected behavior of perf report, etc. E.g.: # perf record -a -e cpu-cycles sleep 2 | perf report | cat failed to open perf.data: No such file or directory (try 'perf record' first) While perf record writes to a fifo, perf report expects perf.data to be read. This patch changes this to accept fifos as input file. Applies to the following commands: perf annotate perf buildid-list perf evlist perf kmem perf lock perf report perf sched perf script perf timechart Also fixes char const* -> const char* type declaration for filename strings. v2: * Prevent potential null pointer access to input_name in builtin-report.c. Needed due to removal of patch "perf report: Setup browser if stdout is a pipe" Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1323248577-11268-5-git-send-email-robert.richter@amd.comSigned-off-by: NRobert Richter <robert.richter@amd.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 12月, 2011 1 次提交
-
-
由 Namhyung Kim 提交于
The '--call-graph' command line option can receive undocumented optional print_limit argument. Besides, use strtoul() to parse the option since its type is u32. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1323703017-6060-2-git-send-email-namhyung@gmail.comSigned-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 28 11月, 2011 1 次提交
-
-
由 David Ahern 提交于
Currently the meaning of -C varies by perf command: for perf-top, perf-stat, perf-record it means cpu list. For perf-report it means comm list. Then perf-annotate, perf-report and perf-script use -c for cpu list. Fix annotate, report and script to use -C for cpu list to be consistent with top, stat and record. This means report needs to use -c for comm list which does introduce a backward compatibility change. v1 -> v2 - update perf-script.txt too Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1321209008-7004-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 08 10月, 2011 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
And add the annotation output knobs to all the tools that have integrated annotation (top, report). Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-gnlob67mke6sji2kf4nstp7m@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-