- 21 6月, 2017 1 次提交
-
-
由 Kan Liang 提交于
Implementing a new --smi-cost mode in perf stat to measure SMI cost. During the measurement, the /sys/device/cpu/freeze_on_smi will be set. The measurement can be done with one counter (unhalted core cycles), and two free running MSR counters (IA32_APERF and SMI_COUNT). In practice, the percentages of SMI core cycles should be more useful than absolute value. So the output will be the percentage of SMI core cycles and SMI#. metric_only will be set by default. SMI cycles% = (aperf - unhalted core cycles) / aperf Here is an example output. Performance counter stats for 'sudo echo ': SMI cycles% SMI# 0.1% 1 0.010858678 seconds time elapsed Users who wants to get the actual value can apply additional --no-metric-only. Signed-off-by: NKan Liang <Kan.liang@intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 02 6月, 2017 1 次提交
-
-
由 Andi Kleen 提交于
Only print the NMI watchdog hint when that watchdog it actually enabled. This avoids printing these unnecessarily. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857wz1i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 25 4月, 2017 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
Not needed in this header, added to the places that need poll(), wait() and a few other prototypes. Link: http://lkml.kernel.org/n/tip-i39c7b6xmo1vwd9wxp6fmkl0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
Not needed in this header, added to the places that need FILE, putchar(), access() and a few other prototypes. Link: http://lkml.kernel.org/n/tip-xxtdsl6nsna82j7puwbdjqhs@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 21 4月, 2017 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
And remove it from util.h, disentangling it a bit more. Link: http://lkml.kernel.org/n/tip-2zg9s5nx90yde64j3g4z2uhk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 4月, 2017 4 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
Removing it from util.h, part of an effort to disentangle the includes hell, that makes changes to util.h or something included by it to cause a complete rebuild of the tools. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ztrjy52q1rqcchuy3rubfgt2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
Moving them from util.h, where they don't belong. Since libc already have string.h, name it slightly differently, as string2.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-eh3vz5sqxsrdd8lodoro4jrw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
More stuff that came from git, out of the hodge-podge that is util.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-e3lana4gctz3ub4hn4y29hkw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
Needed to use the PRI[xu](32,64) formatting macros. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 4月, 2017 1 次提交
-
-
由 Stephane Eranian 提交于
(This is a patch has been sitting in the Intel CQM/CMT driver series for a while, despite not depend on it. Sending it now independently since the series is being discarded.) When an event is in error state, read() returns 0 instead of sizeof() buffer. In certain modes, such as interval printing, ignoring the 0 return value may cause bogus count deltas to be computed and thus invalid results printed. This patch fixes this problem by modifying read_counters() to mark the event as not scaled (scaled = -1) to force the printout routine to show <NOT COUNTED>. Signed-off-by: NStephane Eranian <eranian@google.com> Reviewed-by: NDavid Carrillo-Cisneros <davidcc@google.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170412182301.44406-1-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 11 4月, 2017 1 次提交
-
-
由 Taeung Song 提交于
To strip csv output, use ltrim() instead of just while loop and isspace() at print_metric_{only}_csv(). Signed-off-by: NTaeung Song <treeze.taeung@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1491575061-704-3-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 27 3月, 2017 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
We got it from the git sources but never used it for anything, with the place where this would be somehow used remaining: static int run_builtin(struct cmd_struct *p, int argc, const char **argv) { prefix = NULL; if (p->option & RUN_SETUP) prefix = NULL; /* setup_perf_directory(); */ Ditch it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lpvus@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 23 3月, 2017 1 次提交
-
-
由 Andi Kleen 提交于
Add generic infrastructure to perf stat to output ratios for "MetricExpr" entries in the event lists. Many events are more useful as ratios than in raw form, typically some count in relation to total ticks. Transfer the MetricExpr information from the alias to the evsel. We mark the events that need to be collected for MetricExpr, and also link the events using them with a pointer. The code is careful to always prefer the right event in the same group to minimize multiplexing errors. At the moment only a single relation is supported. Then add a rblist to the stat shadow code that remembers stats based on the cpu and context. Then finally update and retrieve and print these values similarly to the existing hardcoded perf metrics. We use the simple expression parser added earlier to evaluate the expression. Normally we just output the result without further commentary, but for --metric-only this would lead to empty columns. So for this case use the original event as description. There is no attempt to automatically add the MetricExpr event, if it is missing, however we suggest it to the user, because the user tool doesn't have enough information to reliably construct a group that is guaranteed to schedule. So we leave that to the user. % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' 1.000147889 800,085,181 unc_p_clockticks 1.000147889 93,126,241 unc_p_freq_max_os_cycles # 11.6 2.000448381 800,218,217 unc_p_clockticks 2.000448381 142,516,095 unc_p_freq_max_os_cycles # 17.8 3.000639852 800,243,057 unc_p_clockticks 3.000639852 162,292,689 unc_p_freq_max_os_cycles # 20.3 % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only # time freq_max_os_cycles % 1.000127077 0.9 2.000301436 0.7 3.000456379 0.0 v2: Change from DivideBy to MetricExpr v3: Use expr__ prefix. Support more than one other event. v4: Update description v5: Only print warning message once for multiple PMUs. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 22 3月, 2017 3 次提交
-
-
由 Andi Kleen 提交于
When any result that is being merged is bad, mark them all bad to give consistent output in interval mode. No before/after, because the issue was only found in theoretical review and it is hard to reproduce Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170320201711.14142-4-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
The uncore PMU has a lot of duplicated PMUs for different subsystems. When expanding an uncore alias we usually end up with a large number of identically named aliases, which makes perf stat output difficult to read. Automatically sum them up in perf stat, unless --no-merge is specified. This can be default because only the uncores generally have duplicated aliases. Other PMUs have unique names. Before: % perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1 Performance counter stats for 'system wide': 694,976 Bytes unc_c_llc_lookup.any 706,304 Bytes unc_c_llc_lookup.any 956,608 Bytes unc_c_llc_lookup.any 782,720 Bytes unc_c_llc_lookup.any 605,696 Bytes unc_c_llc_lookup.any 442,816 Bytes unc_c_llc_lookup.any 659,328 Bytes unc_c_llc_lookup.any 509,312 Bytes unc_c_llc_lookup.any 263,936 Bytes unc_c_llc_lookup.any 592,448 Bytes unc_c_llc_lookup.any 672,448 Bytes unc_c_llc_lookup.any 608,640 Bytes unc_c_llc_lookup.any 641,024 Bytes unc_c_llc_lookup.any 856,896 Bytes unc_c_llc_lookup.any 808,832 Bytes unc_c_llc_lookup.any 684,864 Bytes unc_c_llc_lookup.any 710,464 Bytes unc_c_llc_lookup.any 538,304 Bytes unc_c_llc_lookup.any 1.002577660 seconds time elapsed After: % perf stat -a -e unc_c_llc_lookup.any sleep 1 Performance counter stats for 'system wide': 2,685,120 Bytes unc_c_llc_lookup.any 1.002648032 seconds time elapsed v2: Split collect_aliases. Rename alias flag. v3: Make sure unsupported/not counted is always printed. v4: Factor out callback change into separate patch. v5: Move check for bad results here Move merged check into collect_data Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170320201711.14142-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
To be used in next patch to support automatic summing of alias events. v2: Move check for bad results to next patch v3: Remove trivial addition. v4: Use perf_evsel__cpus instead of evsel->cpus Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170320201711.14142-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 3月, 2017 2 次提交
-
-
由 Jiri Olsa 提交于
Make system wide (-a) the default option if no target was specified and one of following conditions is met: - there's no workload specified (current behaviour) - there is workload specified but all requested events are system wide ones Mixed events core/uncore with workload: $ perf stat -e 'uncore_cbox_0/clockticks/,cycles' sleep 1 Performance counter stats for 'sleep 1': <not supported> uncore_cbox_0/clockticks/ 980,489 cycles 1.000897406 seconds time elapsed Uncore event with workload: $ perf stat -e 'uncore_cbox_0/clockticks/' sleep 1 Performance counter stats for 'system wide': 281,473,897,192,670 uncore_cbox_0/clockticks/ 1.000833784 seconds time elapsed Committer note: When testing I realized the default case for !root, i.e. no events passed via -e, was broke by v2 of this patch, reported and after a patch provided by Jiri it is back working: [acme@jouet linux]$ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.401335 task-clock:u (msec) # 0.297 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 48 page-faults:u # 0.120 M/sec 458,146 cycles:u # 1.142 GHz 245,113 instructions:u # 0.54 insn per cycle 47,991 branches:u # 119.578 M/sec 4,022 branch-misses:u # 8.38% of all branches 0.001350029 seconds time elapsed [acme@jouet linux]$ Suggested-and-Tested-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170227094818.GA12764@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Borislav Petkov 提交于
When using perf stat on an AMD F15h system with the default hw events attributes, some of the events don't get counted: Performance counter stats for 'sleep 1': 0.749208 task-clock (msec) # 0.001 CPUs utilized 1 context-switches # 0.001 M/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.072 M/sec 1,122,815 cycles # 1.499 GHz 286,740 stalled-cycles-frontend # 25.54% frontend cycles idle <not counted> stalled-cycles-backend (0.00%) ^^^^^^^^^^^^ <not counted> instructions (0.00%) ^^^^^^^^^^^^ <not counted> branches (0.00%) <not counted> branch-misses (0.00%) 1.001550070 seconds time elapsed The reason is that we have the HW watchdog consuming one PMU counter and when perf tries to schedule 6 events on 6 counters and some of those counters are constrained to only a specific subset of PMCs by the hardware, the event scheduling fails. So issue a hint to disable the HW watchdog around a perf stat session. Committer note: Testing it... # perf stat -d usleep 1 Performance counter stats for 'usleep 1': 1.180203 task-clock (msec) # 0.490 CPUs utilized 1 context-switches # 0.847 K/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.046 M/sec 184,754 cycles # 0.157 GHz 714,553 instructions # 3.87 insn per cycle 154,661 branches # 131.046 M/sec 7,247 branch-misses # 4.69% of all branches 219,984 L1-dcache-loads # 186.395 M/sec 17,600 L1-dcache-load-misses # 8.00% of all L1-dcache hits (90.16%) <not counted> LLC-loads (0.00%) <not counted> LLC-load-misses (0.00%) 0.002406823 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog # Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NIngo Molnar <mingo@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Vince Weaver <vince@deater.net> Link: http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnicSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 2月, 2017 1 次提交
-
-
由 Namhyung Kim 提交于
It now can have negative value to suppress the message entirely. So it needs to check it being positive. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170217081742.17417-3-namhyung@kernel.org [ Adjust fuzz on tools/perf/util/pmu.c, add > 0 checks in many other places ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 18 2月, 2017 1 次提交
-
-
由 Jiri Olsa 提交于
Boris asked for default -a option in case we monitor only uncore events. While implementing that I thought it might be actually useful to make it overall default. Running 'perf stat' will now collect system wide data. Committer note: Testing it: # perf stat ^C Performance counter stats for 'system wide': 3571.559178 cpu-clock (msec) # 4.000 CPUs utilized 3,346 context-switches # 0.937 K/sec 277 cpu-migrations # 0.078 K/sec 57,271 page-faults # 0.016 M/sec 4,535,633,835 cycles # 1.270 GHz 6,389,736,516 instructions # 1.41 insn per cycle 1,541,293,875 branches # 431.547 M/sec 14,526,396 branch-misses # 0.94% of all branches 0.892950118 seconds time elapsed # Requested-and-Acked-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170217170034.GB15389@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 17 2月, 2017 1 次提交
-
-
由 Jan Stancek 提交于
There are 2 problems wrt. cpu_topology_map on systems with sparse CPUs: 1. offline/absent CPUs will have their socket_id and core_id set to -1 which triggers: "socket_id number is too big.You may need to upgrade the perf tool." 2. size of cpu_topology_map (perf_env.cpu[]) is allocated based on _SC_NPROCESSORS_CONF, but can be indexed with CPU ids going above. Users of perf_env.cpu[] are using CPU id as index. This can lead to read beyond what was allocated: ==19991== Invalid read of size 4 ==19991== at 0x490CEB: check_cpu_topology (topology.c:69) ==19991== by 0x490CEB: test_session_topology (topology.c:106) ... For example: _SC_NPROCESSORS_CONF == 16 available: 2 nodes (0-1) node 0 cpus: 0 6 8 10 16 22 24 26 node 0 size: 12004 MB node 0 free: 9470 MB node 1 cpus: 1 7 9 11 23 25 27 node 1 size: 12093 MB node 1 free: 9406 MB node distances: node 0 1 0: 10 20 1: 20 10 This patch changes HEADER_NRCPUS.nr_cpus_available from _SC_NPROCESSORS_CONF to max_present_cpu and updates any user of cpu_topology_map to iterate with nr_cpus_avail. As a consequence HEADER_CPU_TOPOLOGY core_id and socket_id lists get longer, but maintain compatibility with pre-patch state - index to cpu_topology_map is CPU id. perf test 36 -v 36: Session topology : --- start --- test child forked, pid 22211 templ file: /tmp/perf-test-gmdX5i CPU 0, core 0, socket 0 CPU 1, core 0, socket 1 CPU 6, core 10, socket 0 CPU 7, core 10, socket 1 CPU 8, core 1, socket 0 CPU 9, core 1, socket 1 CPU 10, core 9, socket 0 CPU 11, core 9, socket 1 CPU 16, core 0, socket 0 CPU 22, core 10, socket 0 CPU 23, core 10, socket 1 CPU 24, core 1, socket 0 CPU 25, core 1, socket 1 CPU 26, core 9, socket 0 CPU 27, core 9, socket 1 test child finished with 0 ---- end ---- Session topology: Ok Signed-off-by: NJan Stancek <jstancek@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/d7c05c6445fca74a8442c2c73cfffd349c52c44f.1487146877.git.jstancek@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 14 2月, 2017 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
When a tool can't open counters due to the kernel.perf_event_paranoit sysctl setting, we inform how to tweak it to allow the operation to succeed, in addition to that, suggest setting /etc/sysctl.conf to make the setting permanent. Suggested-by: NIngo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-4gwe99k4a6p12d4u8bbyttj2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 16 12月, 2016 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
I.e. those parameters/functions _are_ used, so ditch that misleading attribute. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-13cqtjh0yojg5gzvpq1zzpl0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 23 9月, 2016 1 次提交
-
-
由 Mathieu Poirier 提交于
Now that the required mechanic is there to deal with PMU specific configuration, add the functionality to the tools where events can be selected. Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1474041004-13956-7-git-send-email-mathieu.poirier@linaro.org [ Fix the build on XSI-compliant systems, using str_error_r() to make sure we return a string, not an integer ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 24 8月, 2016 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
To match how this is done in the kernel. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-gym6yshewpdegt153u8v2q5r@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
And remove it from tools/perf/{perf,util}.h, making code that needs these macros to include linux/time64.h instead, to match how this is used in the kernel sources. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-e69fc1pvkgt57yvxqt6eunyg@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 8月, 2016 1 次提交
-
-
由 Mark Rutland 提交于
When we don't have a tracee (i.e. we're attaching to a task or CPU), counters can still be running after our workload finishes, and can still be running as we read their values. As we read events one-by-one, there can be arbitrary skew between values of events, even within a group. This means that ratios within an event group are not reliable. This skew can be seen if measuring a group of identical events, e.g: # perf stat -a -C0 -e '{cycles,cycles}' sleep 1 To avoid this, we must stop groups from counting before we read the values of any constituent events. This patch adds and makes use of a new disable_counters() helper, which disables group leaders (and thus each group as a whole). This mirrors the use of enable_counters() for starting event groups in the absence of a tracee. Closing a group leader splits the group, and without a disabled group leader the newly split events will begin counting. Thus to ensure counts are reliable we must defer closing group leaders until all counts have been read. To do so this patch removes the event closing logic from the read_counters() helper, explicitly closes the events using perf_evlist__close(), which also aids legibility. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1470747869-3567-1-git-send-email-mark.rutland@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 19 7月, 2016 1 次提交
-
-
由 Mark Rutland 提交于
In create_perf_stat_counter, when a target CPU has not been provided, we call __perf_evsel__open with empty_cpu_map, and open a single FD per thread. However, in read_counter we assume that we opened events for the product of threads and CPUs described in the evsel's cpu_map. Thus, if an evsel has a cpu_map with more than one entry, we will attempt to access FDs that we didn't open. This could result in a number of problems (e.g. blocking while reading from STDIN if the fd memory happened to be initialised to zero). This is problematic for systems were a logical CPU PMU covers some arbitrary subset of CPUs. The cpu_map of any evsel for that PMU will be initialised based on the cpumask exposed through sysfs, even if the user requests per-thread events. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1468577293-19667-2-git-send-email-mark.rutland@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 7月, 2016 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
The tools so far have been using the strerror_r() GNU variant, that returns a string, be it the buffer passed or something else. But that, besides being tricky in cases where we expect that the function using strerror_r() returns the error formatted in a provided buffer (we have to check if it returned something else and copy that instead), breaks the build on systems not using glibc, like Alpine Linux, where musl libc is used. So, introduce yet another wrapper, str_error_r(), that has the GNU interface, but uses the portable XSI variant of strerror_r(), so that users rest asured that the provided buffer is used and it is what is returned. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 23 6月, 2016 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
To match the semantics for list.h in the kernel, that are used to implement those macros. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qbcjlgj0ffxquxscahbpddi3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 07 6月, 2016 3 次提交
-
-
由 Andi Kleen 提交于
When in CSV mode --metric-only outputs an header, unlike the other modes. Previously it did not properly print headers for the aggregation columns, so the headers were actually shifted against the real values. Fix this here by outputting the correct headers for CSV. v2: Indent array. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1464119559-17203-4-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
When --metric-only is enabled there were no headers for the topology in interval mode. Also when headers were printed they were on a separate line. Before: $ perf stat --metric-only -A -I 1000 -a 1.001038376 frontend cycles idle insn per cycle stalled cycles per insn branch-misses of all branches 1.001038376 CPU0 123.54% 0.23 5.29 7.61% 1.001038376 CPU1 137.78% 0.24 5.13 10.07% 1.001038376 CPU2 64.48% 0.22 5.50 6.84% After: $ perf stat --metric-only -A -I 1000 -a 1.001111114 CPU0 82.46% 0.32 2.60 7.64% 1.001111114 CPU1 126.63% 0.02 42.83 0.15% 1.001111114 CPU2 193.54% 0.32 2.59 6.92% v2: Move all headers on a single line Reported-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1464119559-17203-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Add basic plumbing for TopDown in perf stat TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. Add a new --topdown options to enable events. When --topdown is specified set up events for all topdown events supported by the kernel. Add topdown-* as a special case to the event parser, as is needed for all events containing -. The actual code to compute the metrics is in follow-on patches. v2: Use standard sysctl read function. v3: Move x86 specific code to arch/ v4: Enable --metric-only implicitly for topdown. v5: Add --single-thread option to not force per core mode v6: Fix output order of topdown metrics v7: Allow combining with -d v8: Remove --single-thread again v9: Rename functions, adding arch_ and topdown_. v10: Expand man page and describe TopDown better Paste intro into commit description. Print error when malloc fails. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 17 5月, 2016 2 次提交
-
-
由 Namhyung Kim 提交于
Currently 'perf stat' always counts task-clock event by default. But it's somewhat confusing for system-wide targets (especially with 'sleep N' as the 'sleep' task just sleeps and doesn't use cputime). Changing to cpu-clock event instead for that case makes more sense IMHO. Before: # perf stat -a sleep 0.1 Performance counter stats for 'system wide': 403.038603 task-clock (msec) # 4.001 CPUs utilized 150 context-switches # 0.372 K/sec 7 cpu-migrations # 0.017 K/sec 71 page-faults # 0.176 K/sec 23,705,169 cycles # 0.059 GHz 15,888,166 instructions # 0.67 insn per cycle 3,326,078 branches # 8.253 M/sec 87,643 branch-misses # 2.64% of all branches 0.100737009 seconds time elapsed # After: # perf stat -a sleep 0.1 Performance counter stats for 'system wide': 404.271182 cpu-clock (msec) # 4.000 CPUs utilized 143 context-switches # 0.354 K/sec 13 cpu-migrations # 0.032 K/sec 73 page-faults # 0.181 K/sec 22,119,220 cycles # 0.055 GHz 13,622,065 instructions # 0.62 insn per cycle 2,918,769 branches # 7.220 M/sec 85,033 branch-misses # 2.91% of all branches 0.101073089 seconds time elapsed # Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1463119263-5569-3-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
When the scaling factor is a full integer don't display fractional digits. This avoids unnecessary .00 output for topdown metrics with scale factors. v2: Remove redundant check. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1462489447-31832-7-git-send-email-andi@firstfloor.org [ Rename 'round' to 'stat_round' as 'round' is defined in math.h, included by this patch, and this breaks the build on ubuntu 12.04 ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 5月, 2016 1 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
After 0161028b ("perf/core: Change the default paranoia level to 2") 'perf stat' fails for users without CAP_SYS_ADMIN, so just use 'perf_evsel__fallback()' to have the same behaviour as 'perf record', i.e. set perf_event_attr.exclude_kernel to 1. Now: [acme@jouet linux]$ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.352536 task-clock:u (msec) # 0.423 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 49 page-faults:u # 0.139 M/sec 309,407 cycles:u # 0.878 GHz 243,791 instructions:u # 0.79 insn per cycle 49,622 branches:u # 140.757 M/sec 3,884 branch-misses:u # 7.83% of all branches 0.000834174 seconds time elapsed [acme@jouet linux]$ Reported-by: NIngo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 07 5月, 2016 1 次提交
-
-
由 Andi Kleen 提交于
Add debug output of raw counter values per CPU when perf stat -v is specified, together with their cpu numbers. This is very useful to debug problems with per core counters, where we can normally only see aggregated values. v2: Make it depend on -vv, not -v Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461787251-6702-12-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 11 3月, 2016 2 次提交
-
-
由 Andi Kleen 提交于
Add metric only support for -A too. This requires a new print function that prints the metrics in the right order. v2: Fix manpage v3: Simplify nrcpus computation Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1457049458-28956-7-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Add a new mode to only print metrics. Sometimes we don't care about the raw values, just want the computed metrics. This allows more compact printing, so with -I each sample is only a single line. This also allows easier plotting and processing with other tools. The main target is with using --topdown, but it also works with -T and standard perf stat. A few metrics are not supported. To avoiding having to hardcode all the metrics in the code it uses a two pass approach: first compute dummy metrics and only print the headers in the print_metric callback. Then use the callback to print the actual values. There are some additional changes in the stat printout code to handle all metrics being on a single line. One issue is that the column code doesn't know in advance what events are not supported by the CPU, and it would be hard to find out as this could change based on dynamic conditions. That causes empty columns in some cases. The output can be fairly wide, often you may need more than 80 columns. Example: % perf stat -a -I 1000 --metric-only 1.001452803 frontend cycles idle insn per cycle stalled cycles per insn branch-misses of all branches 1.001452803 158.91% 0.66 2.39 2.92% 2.002192321 180.63% 0.76 2.08 2.96% 3.003088282 150.59% 0.62 2.57 2.84% 4.004369835 196.20% 0.98 1.62 3.79% 5.005227314 231.98% 0.84 1.90 4.71% v2: Lots of updates. v3: Use slightly narrower columns v4: Add comment Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1457049458-28956-6-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 03 3月, 2016 1 次提交
-
-
由 Andi Kleen 提交于
Add an extra check for frontend stalled in the metrics. This avoids an extra column for the --metric-only case when the CPU does not support frontend stalled. v2: Add separate init function Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1456858672-21594-8-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-