提交 · 3315d14f8eea27a845bd2e3a88341a35f4025866 · openeuler / Kernel

27 12月, 2017 7 次提交

perf perf: Remove duplicate includes · 3315d14f

由 Pravin Shedge 提交于 12月 06, 2017

These duplicate includes have been found with scripts/checkincludes.pl
but they have been removed manually to avoid removing false positives.
Signed-off-by: NPravin Shedge <pravin.shedge4linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1512582204-6493-1-git-send-email-pravin.shedge4linux@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3315d14f

perf stat: Resort '--per-thread' result · 29734550

由 Jin Yao 提交于 12月 05, 2017

There are many threads reported if we enable '--per-thread'
globally.

1. Most of the threads are not counted or counting value 0.
This patch removes these threads.

2. We also resort the threads in display according to the
counting value. It's useful for user to see the hottest
threads easily.

For example, the new results would be:

root@skl:/tmp# perf stat --per-thread
^C
Performance counter stats for 'system wide':

perf-24165 4.302433 cpu-clock (msec) # 0.001 CPUs utilized
vmstat-23127 1.562215 cpu-clock (msec) # 0.000 CPUs utilized
irqbalance-2780 0.827851 cpu-clock (msec) # 0.000 CPUs utilized
sshd-23111 0.278308 cpu-clock (msec) # 0.000 CPUs utilized
thermald-2841 0.230880 cpu-clock (msec) # 0.000 CPUs utilized
sshd-23058 0.207306 cpu-clock (msec) # 0.000 CPUs utilized
kworker/0:2-19991 0.133983 cpu-clock (msec) # 0.000 CPUs utilized
kworker/u16:1-18249 0.125636 cpu-clock (msec) # 0.000 CPUs utilized
rcu_sched-8 0.085533 cpu-clock (msec) # 0.000 CPUs utilized
kworker/u16:2-23146 0.077139 cpu-clock (msec) # 0.000 CPUs utilized
gmain-2700 0.041789 cpu-clock (msec) # 0.000 CPUs utilized
kworker/4:1-15354 0.028370 cpu-clock (msec) # 0.000 CPUs utilized
kworker/6:0-17528 0.023895 cpu-clock (msec) # 0.000 CPUs utilized
kworker/4:1H-1887 0.013209 cpu-clock (msec) # 0.000 CPUs utilized
kworker/5:2-31362 0.011627 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/0-11 0.010892 cpu-clock (msec) # 0.000 CPUs utilized
kworker/3:2-12870 0.010220 cpu-clock (msec) # 0.000 CPUs utilized
ksoftirqd/0-7 0.008869 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/1-14 0.008476 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/7-50 0.002944 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/3-26 0.002893 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/4-32 0.002759 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/2-20 0.002429 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/6-44 0.001491 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/5-38 0.001477 cpu-clock (msec) # 0.000 CPUs utilized
rcu_sched-8 10 context-switches # 0.117 M/sec
kworker/u16:1-18249 7 context-switches # 0.056 M/sec
sshd-23111 4 context-switches # 0.014 M/sec
vmstat-23127 4 context-switches # 0.003 M/sec
perf-24165 4 context-switches # 0.930 K/sec
kworker/0:2-19991 3 context-switches # 0.022 M/sec
kworker/u16:2-23146 3 context-switches # 0.039 M/sec
kworker/4:1-15354 2 context-switches # 0.070 M/sec
kworker/6:0-17528 2 context-switches # 0.084 M/sec
sshd-23058 2 context-switches # 0.010 M/sec
ksoftirqd/0-7 1 context-switches # 0.113 M/sec
watchdog/0-11 1 context-switches # 0.092 M/sec
watchdog/1-14 1 context-switches # 0.118 M/sec
watchdog/2-20 1 context-switches # 0.412 M/sec
watchdog/3-26 1 context-switches # 0.346 M/sec
watchdog/4-32 1 context-switches # 0.362 M/sec
watchdog/5-38 1 context-switches # 0.677 M/sec
watchdog/6-44 1 context-switches # 0.671 M/sec
watchdog/7-50 1 context-switches # 0.340 M/sec
kworker/4:1H-1887 1 context-switches # 0.076 M/sec
thermald-2841 1 context-switches # 0.004 M/sec
gmain-2700 1 context-switches # 0.024 M/sec
irqbalance-2780 1 context-switches # 0.001 M/sec
kworker/3:2-12870 1 context-switches # 0.098 M/sec
kworker/5:2-31362 1 context-switches # 0.086 M/sec
kworker/u16:1-18249 2 cpu-migrations # 0.016 M/sec
kworker/u16:2-23146 2 cpu-migrations # 0.026 M/sec
rcu_sched-8 1 cpu-migrations # 0.012 M/sec
sshd-23058 1 cpu-migrations # 0.005 M/sec
perf-24165 8,833,385 cycles # 2.053 GHz
vmstat-23127 1,702,699 cycles # 1.090 GHz
irqbalance-2780 739,847 cycles # 0.894 GHz
sshd-23111 269,506 cycles # 0.968 GHz
thermald-2841 204,556 cycles # 0.886 GHz
sshd-23058 158,780 cycles # 0.766 GHz
kworker/0:2-19991 112,981 cycles # 0.843 GHz
kworker/u16:1-18249 100,926 cycles # 0.803 GHz
rcu_sched-8 74,024 cycles # 0.865 GHz
kworker/u16:2-23146 55,984 cycles # 0.726 GHz
gmain-2700 34,278 cycles # 0.820 GHz
kworker/4:1-15354 20,665 cycles # 0.728 GHz
kworker/6:0-17528 16,445 cycles # 0.688 GHz
kworker/5:2-31362 9,492 cycles # 0.816 GHz
watchdog/3-26 8,695 cycles # 3.006 GHz
kworker/4:1H-1887 8,238 cycles # 0.624 GHz
watchdog/4-32 7,580 cycles # 2.747 GHz
kworker/3:2-12870 7,306 cycles # 0.715 GHz
watchdog/2-20 7,274 cycles # 2.995 GHz
watchdog/0-11 6,988 cycles # 0.642 GHz
ksoftirqd/0-7 6,376 cycles # 0.719 GHz
watchdog/1-14 5,340 cycles # 0.630 GHz
watchdog/5-38 4,061 cycles # 2.749 GHz
watchdog/6-44 3,976 cycles # 2.667 GHz
watchdog/7-50 3,418 cycles # 1.161 GHz
vmstat-23127 2,511,699 instructions # 1.48 insn per cycle
perf-24165 1,829,908 instructions # 0.21 insn per cycle
irqbalance-2780 1,190,204 instructions # 1.61 insn per cycle
thermald-2841 143,544 instructions # 0.70 insn per cycle
sshd-23111 128,138 instructions # 0.48 insn per cycle
sshd-23058 57,654 instructions # 0.36 insn per cycle
rcu_sched-8 44,063 instructions # 0.60 insn per cycle
kworker/u16:1-18249 42,551 instructions # 0.42 insn per cycle
kworker/0:2-19991 25,873 instructions # 0.23 insn per cycle
kworker/u16:2-23146 21,407 instructions # 0.38 insn per cycle
gmain-2700 13,691 instructions # 0.40 insn per cycle
kworker/4:1-15354 12,964 instructions # 0.63 insn per cycle
kworker/6:0-17528 10,034 instructions # 0.61 insn per cycle
kworker/5:2-31362 5,203 instructions # 0.55 insn per cycle
kworker/3:2-12870 4,866 instructions # 0.67 insn per cycle
kworker/4:1H-1887 3,586 instructions # 0.44 insn per cycle
ksoftirqd/0-7 3,463 instructions # 0.54 insn per cycle
watchdog/0-11 3,135 instructions # 0.45 insn per cycle
watchdog/1-14 3,135 instructions # 0.59 insn per cycle
watchdog/2-20 3,135 instructions # 0.43 insn per cycle
watchdog/3-26 3,135 instructions # 0.36 insn per cycle
watchdog/4-32 3,135 instructions # 0.41 insn per cycle
watchdog/5-38 3,135 instructions # 0.77 insn per cycle
watchdog/6-44 3,135 instructions # 0.79 insn per cycle
watchdog/7-50 3,135 instructions # 0.92 insn per cycle
vmstat-23127 539,181 branches # 345.139 M/sec
perf-24165 375,364 branches # 87.245 M/sec
irqbalance-2780 262,092 branches # 316.593 M/sec
thermald-2841 31,611 branches # 136.915 M/sec
sshd-23111 21,874 branches # 78.596 M/sec
sshd-23058 10,682 branches # 51.528 M/sec
rcu_sched-8 8,693 branches # 101.633 M/sec
kworker/u16:1-18249 7,891 branches # 62.808 M/sec
kworker/0:2-19991 5,761 branches # 42.998 M/sec
kworker/u16:2-23146 4,099 branches # 53.138 M/sec
kworker/4:1-15354 2,755 branches # 97.110 M/sec
gmain-2700 2,638 branches # 63.127 M/sec
kworker/6:0-17528 2,216 branches # 92.739 M/sec
kworker/5:2-31362 1,132 branches # 97.360 M/sec
kworker/3:2-12870 1,081 branches # 105.773 M/sec
kworker/4:1H-1887 725 branches # 54.887 M/sec
ksoftirqd/0-7 707 branches # 79.716 M/sec
watchdog/0-11 652 branches # 59.860 M/sec
watchdog/1-14 652 branches # 76.923 M/sec
watchdog/2-20 652 branches # 268.423 M/sec
watchdog/3-26 652 branches # 225.372 M/sec
watchdog/4-32 652 branches # 236.318 M/sec
watchdog/5-38 652 branches # 441.435 M/sec
watchdog/6-44 652 branches # 437.290 M/sec
watchdog/7-50 652 branches # 221.467 M/sec
vmstat-23127 8,960 branch-misses # 1.66% of all branches
irqbalance-2780 3,047 branch-misses # 1.16% of all branches
perf-24165 2,876 branch-misses # 0.77% of all branches
sshd-23111 1,843 branch-misses # 8.43% of all branches
thermald-2841 1,444 branch-misses # 4.57% of all branches
sshd-23058 1,379 branch-misses # 12.91% of all branches
kworker/u16:1-18249 982 branch-misses # 12.44% of all branches
rcu_sched-8 893 branch-misses # 10.27% of all branches
kworker/u16:2-23146 578 branch-misses # 14.10% of all branches
kworker/0:2-19991 376 branch-misses # 6.53% of all branches
gmain-2700 280 branch-misses # 10.61% of all branches
kworker/6:0-17528 196 branch-misses # 8.84% of all branches
kworker/4:1-15354 187 branch-misses # 6.79% of all branches
kworker/5:2-31362 123 branch-misses # 10.87% of all branches
watchdog/0-11 95 branch-misses # 14.57% of all branches
watchdog/4-32 89 branch-misses # 13.65% of all branches
kworker/3:2-12870 80 branch-misses # 7.40% of all branches
watchdog/3-26 61 branch-misses # 9.36% of all branches
kworker/4:1H-1887 60 branch-misses # 8.28% of all branches
watchdog/2-20 52 branch-misses # 7.98% of all branches
ksoftirqd/0-7 47 branch-misses # 6.65% of all branches
watchdog/1-14 46 branch-misses # 7.06% of all branches
watchdog/7-50 13 branch-misses # 1.99% of all branches
watchdog/5-38 8 branch-misses # 1.23% of all branches
watchdog/6-44 7 branch-misses # 1.07% of all branches

3.695150786 seconds time elapsed

root@skl:/tmp# perf stat --per-thread -M IPC,CPI
^C

Performance counter stats for 'system wide':

vmstat-23127 2,000,783 inst_retired.any # 1.5 IPC
thermald-2841 1,472,670 inst_retired.any # 1.3 IPC
sshd-23111 977,374 inst_retired.any # 1.2 IPC
perf-24163 483,779 inst_retired.any # 0.2 IPC
gmain-2700 341,213 inst_retired.any # 0.9 IPC
sshd-23058 148,891 inst_retired.any # 0.8 IPC
rtkit-daemon-3288 71,210 inst_retired.any # 0.7 IPC
kworker/u16:1-18249 39,562 inst_retired.any # 0.3 IPC
rcu_sched-8 14,474 inst_retired.any # 0.8 IPC
kworker/0:2-19991 7,659 inst_retired.any # 0.2 IPC
kworker/4:1-15354 6,714 inst_retired.any # 0.8 IPC
rtkit-daemon-3289 4,839 inst_retired.any # 0.3 IPC
kworker/6:0-17528 3,321 inst_retired.any # 0.6 IPC
kworker/5:2-31362 3,215 inst_retired.any # 0.5 IPC
kworker/7:2-23145 3,173 inst_retired.any # 0.7 IPC
kworker/4:1H-1887 1,719 inst_retired.any # 0.3 IPC
watchdog/0-11 1,479 inst_retired.any # 0.3 IPC
watchdog/1-14 1,479 inst_retired.any # 0.3 IPC
watchdog/2-20 1,479 inst_retired.any # 0.4 IPC
watchdog/3-26 1,479 inst_retired.any # 0.4 IPC
watchdog/4-32 1,479 inst_retired.any # 0.3 IPC
watchdog/5-38 1,479 inst_retired.any # 0.3 IPC
watchdog/6-44 1,479 inst_retired.any # 0.7 IPC
watchdog/7-50 1,479 inst_retired.any # 0.7 IPC
kworker/u16:2-23146 1,408 inst_retired.any # 0.5 IPC
perf-24163 2,249,872 cpu_clk_unhalted.thread
vmstat-23127 1,352,455 cpu_clk_unhalted.thread
thermald-2841 1,161,140 cpu_clk_unhalted.thread
sshd-23111 807,827 cpu_clk_unhalted.thread
gmain-2700 375,535 cpu_clk_unhalted.thread
sshd-23058 194,071 cpu_clk_unhalted.thread
kworker/u16:1-18249 114,306 cpu_clk_unhalted.thread
rtkit-daemon-3288 103,547 cpu_clk_unhalted.thread
kworker/0:2-19991 46,550 cpu_clk_unhalted.thread
rcu_sched-8 18,855 cpu_clk_unhalted.thread
rtkit-daemon-3289 17,549 cpu_clk_unhalted.thread
kworker/4:1-15354 8,812 cpu_clk_unhalted.thread
kworker/5:2-31362 6,812 cpu_clk_unhalted.thread
kworker/4:1H-1887 5,270 cpu_clk_unhalted.thread
kworker/6:0-17528 5,111 cpu_clk_unhalted.thread
kworker/7:2-23145 4,667 cpu_clk_unhalted.thread
watchdog/0-11 4,663 cpu_clk_unhalted.thread
watchdog/1-14 4,663 cpu_clk_unhalted.thread
watchdog/4-32 4,626 cpu_clk_unhalted.thread
watchdog/5-38 4,403 cpu_clk_unhalted.thread
watchdog/3-26 3,936 cpu_clk_unhalted.thread
watchdog/2-20 3,850 cpu_clk_unhalted.thread
kworker/u16:2-23146 2,654 cpu_clk_unhalted.thread
watchdog/6-44 2,017 cpu_clk_unhalted.thread
watchdog/7-50 2,017 cpu_clk_unhalted.thread
vmstat-23127 2,000,783 inst_retired.any # 0.7 CPI
thermald-2841 1,472,670 inst_retired.any # 0.8 CPI
sshd-23111 977,374 inst_retired.any # 0.8 CPI
perf-24163 495,037 inst_retired.any # 4.7 CPI
gmain-2700 341,213 inst_retired.any # 1.1 CPI
sshd-23058 148,891 inst_retired.any # 1.3 CPI
rtkit-daemon-3288 71,210 inst_retired.any # 1.5 CPI
kworker/u16:1-18249 39,562 inst_retired.any # 2.9 CPI
rcu_sched-8 14,474 inst_retired.any # 1.3 CPI
kworker/0:2-19991 7,659 inst_retired.any # 6.1 CPI
kworker/4:1-15354 6,714 inst_retired.any # 1.3 CPI
rtkit-daemon-3289 4,839 inst_retired.any # 3.6 CPI
kworker/6:0-17528 3,321 inst_retired.any # 1.5 CPI
kworker/5:2-31362 3,215 inst_retired.any # 2.1 CPI
kworker/7:2-23145 3,173 inst_retired.any # 1.5 CPI
kworker/4:1H-1887 1,719 inst_retired.any # 3.1 CPI
watchdog/0-11 1,479 inst_retired.any # 3.2 CPI
watchdog/1-14 1,479 inst_retired.any # 3.2 CPI
watchdog/2-20 1,479 inst_retired.any # 2.6 CPI
watchdog/3-26 1,479 inst_retired.any # 2.7 CPI
watchdog/4-32 1,479 inst_retired.any # 3.1 CPI
watchdog/5-38 1,479 inst_retired.any # 3.0 CPI
watchdog/6-44 1,479 inst_retired.any # 1.4 CPI
watchdog/7-50 1,479 inst_retired.any # 1.4 CPI
kworker/u16:2-23146 1,408 inst_retired.any # 1.9 CPI
perf-24163 2,302,323 cycles
vmstat-23127 1,352,455 cycles
thermald-2841 1,161,140 cycles
sshd-23111 807,827 cycles
gmain-2700 375,535 cycles
sshd-23058 194,071 cycles
kworker/u16:1-18249 114,306 cycles
rtkit-daemon-3288 103,547 cycles
kworker/0:2-19991 46,550 cycles
rcu_sched-8 18,855 cycles
rtkit-daemon-3289 17,549 cycles
kworker/4:1-15354 8,812 cycles
kworker/5:2-31362 6,812 cycles
kworker/4:1H-1887 5,270 cycles
kworker/6:0-17528 5,111 cycles
kworker/7:2-23145 4,667 cycles
watchdog/0-11 4,663 cycles
watchdog/1-14 4,663 cycles
watchdog/4-32 4,626 cycles
watchdog/5-38 4,403 cycles
watchdog/3-26 3,936 cycles
watchdog/2-20 3,850 cycles
kworker/u16:2-23146 2,654 cycles
watchdog/6-44 2,017 cycles
watchdog/7-50 2,017 cycles

2.175726600 seconds time elapsed
Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-12-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

29734550

perf stat: Remove --per-thread pid/tid limitation · 1d9f8d1b

由 Jin Yao 提交于 12月 05, 2017

Currently, if we execute 'perf stat --per-thread' without specifying
pid/tid, perf will return error.

root@skl:/tmp# perf stat --per-thread
The --per-thread option is only available when monitoring via -p -t options.
    -p, --pid <pid>       stat events on existing process id
    -t, --tid <tid>       stat events on existing thread id

This patch removes this limitation. If no pid/tid specified, it returns
all threads (get threads from /proc).

Note that it doesn't support cpu_list yet so if it's a cpu_list case,
then skip.
Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-11-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

1d9f8d1b

perf stat: Update or print per-thread stats · 14e72a21

由 Jin Yao 提交于 12月 05, 2017

If the stats pointer in stat_config structure is not null, it will
update the per-thread stats or print the per-thread stats on this
buffer.
Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-9-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

14e72a21

perf stat: Allocate shadow stats buffer for threads · 56739444

由 Jin Yao 提交于 12月 05, 2017

After perf_evlist__create_maps() being executed, we can get all threads
from /proc. And via thread_map__nr(), we can also get the number of
threads.

With the number of threads, the patch allocates a buffer which will
record the shadow stats for these threads.

The buffer pointer is saved in stat_config.
Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-8-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

56739444

perf stat: Print per-thread shadow stats · e0128b30

由 Jin Yao 提交于 12月 05, 2017

The function perf_stat__print_shadow_stats() is called to print the
shadow stats on a set of static variables.

But the static variables are the limitations to support
per-thread shadow stats.

This patch lets the perf_stat__print_shadow_stats() support
to print the shadow stats from a input parameter 'st'.

It will not directly get value from static variable. Instead,
it now uses runtime_stat_avg() and runtime_stat_n() to get and
compute the values.
Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-6-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e0128b30

perf stat: Update per-thread shadow stats · 1fcd0394

由 Jin Yao 提交于 12月 05, 2017

The functions perf_stat__update_shadow_stats() is called to update the
shadow stats on a set of static variables.

But the static variables are the limitations to be extended to support
per-thread shadow stats.

This patch lets the perf_stat__update_shadow_stats() support to update
the shadow stats on a input parameter 'st' and uses
update_runtime_stat() to update the stats. It will not directly update
the static variables as before.
Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512482591-4646-5-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

1fcd0394

30 11月, 2017 1 次提交

perf record: Synthesize unit/scale/... in event update · bfd8f72c

由 Andi Kleen 提交于 11月 17, 2017

Move the code to synthesize event updates for scale/unit/cpus to a
common utility file, and use it both from stat and record.

This allows to access scale and other extra qualifiers from perf script.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171117214300.32746-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

bfd8f72c

31 10月, 2017 3 次提交

perf stat: Move the shadow stats scale computation in perf_stat__update_shadow_stats · 54830dd0

由 Jiri Olsa 提交于 1月 23, 2017

Move the shadow stats scale computation to the
perf_stat__update_shadow_stats() function, so it's centralized and we
don't forget to do it. It also saves few lines of code.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-htg7mmyxv6pcrf57qyo6msid@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

54830dd0

perf tools: Add struct perf_data_file · eae8ad80

由 Jiri Olsa 提交于 1月 23, 2017

Add struct perf_data_file to represent a single file within a perf_data
struct.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

eae8ad80

perf tools: Rename struct perf_data_file to perf_data · 8ceb41d7

由 Jiri Olsa 提交于 1月 23, 2017

Rename struct perf_data_file to perf_data, because we will add the
possibility to have multiple files under perf.data, so the 'perf_data'
name fits better.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

8ceb41d7

27 10月, 2017 1 次提交

perf evsel: Restore evsel->priv as a tool private area · e669e833

由 Arnaldo Carvalho de Melo 提交于 10月 26, 2017

When we started using it for stats and did it not just in
builtin-stat.c, but also for builtin-script.c, then it stopped being a
tool private area, so introduce a new pointer for these stats and leave
->priv to its original purpose.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Fixes: cfc8874a ("perf script: Process cpu/threads maps")
Link: http://lkml.kernel.org/n/tip-jtpzx3rjqo78snmmsdzwb2eb@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e669e833

13 9月, 2017 5 次提交

perf stat: Fall weak group back even for EBADF · 35c1980e

由 Andi Kleen 提交于 9月 05, 2017

It's not possible to run a package event and a per cpu event in the same
group. This is used by some of the power metrics.  They work correctly
when not using a group.

Normally weak groups should handle that, but in this case EBADF is
returned instead of the normal EINVAL.

  $ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
  Using CPUID GenuineIntel-6-3E
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3
  perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4
  perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor)

and perf errors out.

Make weak groups trigger a fall back for EBADF too. Then this case works correctly:

  $ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
  Using CPUID GenuineIntel-6-3E
  Weak group for cstate_pkg/c2-residency//2 failed
  cstate_pkg/c2-residency/: 476709882 1000598460 1000598460
  msr/tsc/: 39625837911 12007369110 12007369110

   Performance counter stats for 'system wide':

         476,709,882      cstate_pkg/c2-residency/
      39,625,837,911      msr/tsc/

         1.000697588 seconds time elapsed

  This fixes perf stat -M Power ...

  $ perf stat -M Power --metric-only -a sleep 1

   Performance counter stats for 'system wide':

  Turbo_Utilization  C3_Core_Residency  C6_Core_Residency C7_Core_Residency  C2_Pkg_Residency   C3_Pkg_Residency  C6_Pkg_Residency  C7_Pkg_Residency
       1.0                 0.7                30.0               0.0               0.9                 0.1               0.4                 0.0

         1.001240740 seconds time elapsed
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

35c1980e

perf stat: Update walltime_nsecs_stats in interval mode · b90f1333

由 Andi Kleen 提交于 8月 31, 2017

Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval.
Compute it for each interval instead of only at the end.

Pointed out by Jiri.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b90f1333

perf stat: Hide internal duration_time counter · e864c5ca

由 Andi Kleen 提交于 8月 31, 2017

Some perf stat metrics use an internal "duration_time" metric. It is not
correctly printed however. So hide it during output to avoid confusing
users with 0 counts.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e864c5ca

perf stat: Support JSON metrics in perf stat · b18f3e36

由 Andi Kleen 提交于 8月 31, 2017

Add generic support for standalone metrics specified in JSON files to
perf stat. A metric is a formula that uses multiple events to compute a
higher level result (e.g. IPC).

Previously metrics were always tied to an event and automatically
enabled with that event. But now change it that we can have standalone
metrics. They are in the same JSON data structure as events, but don't
have an event name.

We also allow to organize the metrics in metric groups, which allows a
short cut to select several related metrics at once.

Add a new -M / --metrics option to perf stat that adds the metrics or
metric groups specified.

Add the core code to manage and parse the metric groups. They are
collected from the JSON data structures into a separate rblist. When
computing shadow values look for metrics in that list. Then they are
computed using the existing saved values infrastructure in stat-shadow.c

The actual JSON metrics are in a separate pull request.

% perf stat -M Summary --metric-only -a sleep 1

Performance counter stats for 'system wide':

Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
317614222.0 1392930775.0 0.0 0.0 0.2 0.1

1.001497549 seconds time elapsed

% perf stat -M GFLOPs flops

Performance counter stats for 'flops':

3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%)
14 fp_comp_ops_exe.sse_scalar_double (66.65%)
0 fp_comp_ops_exe.sse_packed_double (66.67%)
0 fp_comp_ops_exe.sse_packed_single (66.70%)
0 simd_fp_256.packed_double (66.70%)
0 simd_fp_256.packed_single (66.67%)
0 duration_time

3.238372845 seconds time elapsed

v2: Add missing header file
v3: Move find_map to pmu.c
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b18f3e36

perf tools: Support weak groups in 'perf stat' · 5a5dfe4b

由 Andi Kleen 提交于 8月 31, 2017

Setting up groups can be complicated due to the complicated scheduling
restrictions of different PMUs.

User tools usually don't understand all these restrictions.

Still in many cases it is useful to set up groups and they work most of
the time. However if the group is set up wrong some members will not
report any value because they never get scheduled.

Add a concept of a 'weak group': try to set up a group, but if it's not
schedulable fallback to not using a group. That gives us the best of
both worlds: groups if they work, but still a usable fallback if they
don't.

In theory it would be possible to have more complex fallback strategies
(e.g. try to split the group in half), but the simple fallback of not
using a group seems to work for now.

So far the weak group is only implemented for perf stat, not for record.

Here's an unschedulable group (on IvyBridge with SMT on)

  % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1

        73,806,067      branches
         4,848,144      branch-misses             #    6.57% of all branches
        14,754,458      l1d.replacement
        24,905,558      l2_lines_in.all
   <not supported>      l2_rqsts.all_code_rd         <------- will never report anything

With the weak group:

  % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1

       125,366,055      branches                                                      (80.02%)
         9,208,402      branch-misses             #    7.35% of all branches          (80.01%)
        24,560,249      l1d.replacement                                               (80.00%)
        43,174,971      l2_lines_in.all                                               (80.05%)
        31,891,457      l2_rqsts.all_code_rd                                          (79.92%)

The extra event scheduled with some extra multiplexing

v2: Move fallback code to separate function.
Add comment on for_each_group_member
Adjust to new perf_evsel__close interface
v3: Fix debug print out.

Committer testing:

Before:

  # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1

   Performance counter stats for 'system wide':

     <not counted>      branches
     <not counted>      branch-misses
     <not counted>      l1d.replacement
     <not counted>      l2_lines_in.all
   <not supported>      l2_rqsts.all_code_rd

       1.002147212 seconds time elapsed

  # perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1

   Performance counter stats for 'system wide':

        83,207,892      branches
        11,065,444      l1d.replacement
        28,484,024      l2_lines_in.all
        12,186,179      l2_rqsts.all_code_rd

       1.001739493 seconds time elapsed

After:

  # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1

   Performance counter stats for 'system wide':

       543,323,909      branches                                                      (80.01%)
        27,100,512      branch-misses             #    4.99% of all branches          (80.02%)
        50,402,905      l1d.replacement                                               (80.03%)
        67,385,892      l2_lines_in.all                                               (80.01%)
        21,352,885      l2_rqsts.all_code_rd                                          (79.94%)

       1.001086658 seconds time elapsed

  #
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org
[ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

5a5dfe4b

12 9月, 2017 1 次提交

perf stat: Wait for the correct child · dfc9eec7

由 Milian Wolff 提交于 9月 12, 2017

When packaging the perf userland application into an AppImage, the
wait() call in perf stat returned too early. It turned out that some
other child process exited, but not the one perf stat launched:

  $ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1
  execve("./perf-git.3a73b7f9-x86_64.AppImage", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars */) = 0
  clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912
  strace: Process 3912 attached
  [pid  3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914
  strace: Process 3914 attached
  [pid  3912] +++ exited with 0 +++
  [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
  [pid  3914] clone(strace: Process 3915 attached
  child_stack=0x7f6a6d9fefb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915
  [pid  3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 /* 21 vars */) = 0
  [pid  3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916
  strace: Process 3916 attached
  [pid  3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912
  [pid  3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
  [pid  3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */
   Performance counter stats for 'sleep 1':

       <not counted>	task-clock
       <not counted>	context-switches
       <not counted>	cpu-migrations
       <not counted>	page-faults
       <not counted>	cycles
       <not counted>	instructions
       <not counted>      branches
       <not counted>      branch-misses

         0.000047194 seconds time elapsed

  [pid  3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} ---
  [pid  3916] +++ killed by SIGTERM +++
  [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
  [pid  3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} ---
  [pid  3911] +++ exited with 0 +++
  [pid  3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} ---
  [pid  3915] +++ exited with 0 +++
  +++ exited with 0 +++

This patch uses waitpid instead to ensure the call waits for the
debuggee application launched by 'perf stat'. This fixes 'perf stat'
when launched from an AppImage:

  $ ./perf-x86_64.AppImage stat sleep 1

   Performance counter stats for 'sleep 1':

          0.357235      task-clock (msec)         #    0.000 CPUs utilized
                 1      context-switches          #    0.003 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                50      page-faults               #    0.140 M/sec
           1269602      cycles                    #    3.554 GHz
            654278      instructions              #    0.52  insn per cycle
            129963      branches                  #  363.803 M/sec
              7082      branch-misses             #    5.45% of all branches

       1.000633420 seconds time elapsed
Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

dfc9eec7

02 9月, 2017 1 次提交

perf stat: Only auto-merge events that are PMU aliases · 63ce8449

由 Arnaldo Carvalho de Melo 提交于 8月 31, 2017

Peter reported that when he explicitely asked for multiple events with
the same name on the command line it got coalesced into just one line,
i.e.:

   # perf stat -e cycles -e cycles -e cycles usleep 1

   Performance counter stats for 'usleep 1':

         3,269,652      cycles

       0.000884123 seconds time elapsed

  #

And while there is the --no-merges option to disable that auto-merging,
this is a blunt change in behaviour for such explicit request, so change
the code so that this auto merging is done only when handling the multi
PMU aliases with the same name that introduced this coalescing,
restoring the previous behaviour for the explicit case:

  # perf stat -e cycles -e cycles -e cycles usleep 1

   Performance counter stats for 'usleep 1':

         1,472,837      cycles
         1,472,837      cycles
         1,472,837      cycles

       0.001764870 seconds time elapsed

  #
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 430daf2d ("perf stat: Collapse identically named events")
Link: http://lkml.kernel.org/r/20170831184122.GK4831@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

63ce8449

27 7月, 2017 1 次提交

perf stat: Use group read for event groups · 82bf311e

由 Jiri Olsa 提交于 7月 26, 2017

Make perf stat use  group read if there  are groups defined. The group
read will get the values for all member of groups within a single
syscall instead of calling read syscall for every event.

We can see considerable less amount of kernel cycles spent on single
group read, than reading each event separately, like for following perf
stat command:

  # perf stat -e {cycles,instructions} -I 10 -a sleep 1

Monitored with "perf stat -r 5 -e '{cycles:u,cycles:k}'"

Before:

        24,325,676      cycles:u
       297,040,775      cycles:k

       1.038554134 seconds time elapsed

After:
        25,034,418      cycles:u
       158,256,395      cycles:k

       1.036864497 seconds time elapsed

The perf_evsel__open fallback changes contributed by Andi Kleen.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170726120206.9099-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

82bf311e

27 6月, 2017 1 次提交

perf tools: Replace error() with pr_err() · 62d94b00

由 Arnaldo Carvalho de Melo 提交于 6月 27, 2017

To consolidate the error reporting facility.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-b41iot1094katoffdf19w9zk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

62d94b00

21 6月, 2017 1 次提交

perf stat: Add support to measure SMI cost · daefd0bc

由 Kan Liang 提交于 5月 26, 2017

Implementing a new --smi-cost mode in perf stat to measure SMI cost.

During the measurement, the /sys/device/cpu/freeze_on_smi will be set.

The measurement can be done with one counter (unhalted core cycles), and
two free running MSR counters (IA32_APERF and SMI_COUNT).

In practice, the percentages of SMI core cycles should be more useful
than absolute value. So the output will be the percentage of SMI core
cycles and SMI#. metric_only will be set by default.

SMI cycles% = (aperf - unhalted core cycles) / aperf

Here is an example output.

 Performance counter stats for 'sudo echo ':

SMI cycles%          SMI#
    0.1%              1

       0.010858678 seconds time elapsed

Users who wants to get the actual value can apply additional
--no-metric-only.
Signed-off-by: NKan Liang <Kan.liang@intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

daefd0bc

02 6月, 2017 1 次提交

perf stat: Only print NMI watchdog hint when enabled · 918c7b06

由 Andi Kleen 提交于 5月 22, 2017

Only print the NMI watchdog hint when that watchdog it actually enabled.

This avoids printing these unnecessarily.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857wz1i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

918c7b06

25 4月, 2017 2 次提交

perf tools: Remove poll.h and wait.h from util.h · 4208735d

由 Arnaldo Carvalho de Melo 提交于 4月 19, 2017

Not needed in this header, added to the places that need poll(), wait()
and a few other prototypes.

Link: http://lkml.kernel.org/n/tip-i39c7b6xmo1vwd9wxp6fmkl0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

4208735d

perf tools: Remove string.h, unistd.h and sys/stat.h from util.h · 7a8ef4c4

由 Arnaldo Carvalho de Melo 提交于 4月 19, 2017

Not needed in this header, added to the places that need FILE,
putchar(), access() and a few other prototypes.

Link: http://lkml.kernel.org/n/tip-xxtdsl6nsna82j7puwbdjqhs@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

7a8ef4c4

21 4月, 2017 1 次提交

perf tools: Add signal.h to places using its definitions · 9607ad3a

由 Arnaldo Carvalho de Melo 提交于 4月 19, 2017

And remove it from util.h, disentangling it a bit more.

Link: http://lkml.kernel.org/n/tip-2zg9s5nx90yde64j3g4z2uhk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

9607ad3a

20 4月, 2017 4 次提交

perf tools: Include errno.h where needed · a43783ae

由 Arnaldo Carvalho de Melo 提交于 4月 18, 2017

Removing it from util.h, part of an effort to disentangle the includes
hell, that makes changes to util.h or something included by it to cause
a complete rebuild of the tools.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ztrjy52q1rqcchuy3rubfgt2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

a43783ae

perf tools: Move extra string util functions to util/string2.h · a067558e

由 Arnaldo Carvalho de Melo 提交于 4月 17, 2017

Moving them from util.h, where they don't belong. Since libc already
have string.h, name it slightly differently, as string2.h.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-eh3vz5sqxsrdd8lodoro4jrw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

a067558e

perf tools: Move sane ctype stuff from util.h to sane_ctype.h · 3d689ed6

由 Arnaldo Carvalho de Melo 提交于 4月 17, 2017

More stuff that came from git, out of the hodge-podge that is util.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-e3lana4gctz3ub4hn4y29hkw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3d689ed6

perf tools: Including missing inttypes.h header · fd20e811

由 Arnaldo Carvalho de Melo 提交于 4月 17, 2017

Needed to use the PRI[xu](32,64) formatting macros.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

fd20e811

13 4月, 2017 1 次提交

perf stat: Fix bug in handling events in error state · db49a717

由 Stephane Eranian 提交于 4月 12, 2017

(This is a patch has been sitting in the Intel CQM/CMT driver series for
 a while, despite not depend on it. Sending it now independently since
 the series is being discarded.)

When an event is in error state, read() returns 0 instead of sizeof()
buffer. In certain modes, such as interval printing, ignoring the 0
return value may cause bogus count deltas to be computed and thus
invalid results printed.

This patch fixes this problem by modifying read_counters() to mark the
event as not scaled (scaled = -1) to force the printout routine to show
<NOT COUNTED>.
Signed-off-by: NStephane Eranian <eranian@google.com>
Reviewed-by: NDavid Carrillo-Cisneros <davidcc@google.com>
Acked-by: NJiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170412182301.44406-1-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

db49a717

11 4月, 2017 1 次提交

perf stat: Refactor the code to strip csv output with ltrim() · b07c40df

由 Taeung Song 提交于 4月 07, 2017

To strip csv output, use ltrim() instead of just while loop and
isspace() at print_metric_{only}_csv().
Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1491575061-704-3-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b07c40df

27 3月, 2017 1 次提交

perf tools: Remove unused 'prefix' from builtin functions · b0ad8ea6

由 Arnaldo Carvalho de Melo 提交于 3月 27, 2017

We got it from the git sources but never used it for anything, with the
place where this would be somehow used remaining:

  static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
  {
	prefix = NULL;
	if (p->option & RUN_SETUP)
		prefix = NULL; /* setup_perf_directory(); */

Ditch it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lpvus@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b0ad8ea6

23 3月, 2017 1 次提交

perf stat: Output JSON MetricExpr metric · 37932c18

由 Andi Kleen 提交于 3月 20, 2017

Add generic infrastructure to perf stat to output ratios for
"MetricExpr" entries in the event lists. Many events are more useful as
ratios than in raw form, typically some count in relation to total
ticks.

Transfer the MetricExpr information from the alias to the evsel.

We mark the events that need to be collected for MetricExpr, and also
link the events using them with a pointer. The code is careful to always
prefer the right event in the same group to minimize multiplexing
errors. At the moment only a single relation is supported.

Then add a rblist to the stat shadow code that remembers stats based on
the cpu and context.

Then finally update and retrieve and print these values similarly to the
existing hardcoded perf metrics. We use the simple expression parser
added earlier to evaluate the expression.

Normally we just output the result without further commentary, but for
--metric-only this would lead to empty columns. So for this case use the
original event as description.

There is no attempt to automatically add the MetricExpr event, if it is
missing, however we suggest it to the user, because the user tool
doesn't have enough information to reliably construct a group that is
guaranteed to schedule. So we leave that to the user.

% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
1.000147889 800,085,181 unc_p_clockticks
1.000147889 93,126,241 unc_p_freq_max_os_cycles # 11.6
2.000448381 800,218,217 unc_p_clockticks
2.000448381 142,516,095 unc_p_freq_max_os_cycles # 17.8
3.000639852 800,243,057 unc_p_clockticks
3.000639852 162,292,689 unc_p_freq_max_os_cycles # 20.3

% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
# time freq_max_os_cycles %
1.000127077 0.9
2.000301436 0.7
3.000456379 0.0

v2: Change from DivideBy to MetricExpr
v3: Use expr__ prefix. Support more than one other event.
v4: Update description
v5: Only print warning message once for multiple PMUs.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

37932c18

22 3月, 2017 3 次提交

perf stat: Handle partially bad results with merging · b4229e9d

由 Andi Kleen 提交于 3月 20, 2017

When any result that is being merged is bad, mark them all bad to give
consistent output in interval mode.

No before/after, because the issue was only found in theoretical review
and it is hard to reproduce
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-4-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b4229e9d

perf stat: Collapse identically named events · 430daf2d

由 Andi Kleen 提交于 3月 20, 2017

The uncore PMU has a lot of duplicated PMUs for different subsystems.
When expanding an uncore alias we usually end up with a large
number of identically named aliases, which makes perf stat
output difficult to read.

Automatically sum them up in perf stat, unless --no-merge is specified.

This can be default because only the uncores generally have duplicated
aliases. Other PMUs have unique names.

Before:

  % perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1

  Performance counter stats for 'system wide':

           694,976 Bytes unc_c_llc_lookup.any
           706,304 Bytes unc_c_llc_lookup.any
           956,608 Bytes unc_c_llc_lookup.any
           782,720 Bytes unc_c_llc_lookup.any
           605,696 Bytes unc_c_llc_lookup.any
           442,816 Bytes unc_c_llc_lookup.any
           659,328 Bytes unc_c_llc_lookup.any
           509,312 Bytes unc_c_llc_lookup.any
           263,936 Bytes unc_c_llc_lookup.any
           592,448 Bytes unc_c_llc_lookup.any
           672,448 Bytes unc_c_llc_lookup.any
           608,640 Bytes unc_c_llc_lookup.any
           641,024 Bytes unc_c_llc_lookup.any
           856,896 Bytes unc_c_llc_lookup.any
           808,832 Bytes unc_c_llc_lookup.any
           684,864 Bytes unc_c_llc_lookup.any
           710,464 Bytes unc_c_llc_lookup.any
           538,304 Bytes unc_c_llc_lookup.any

       1.002577660 seconds time elapsed

After:

  % perf stat -a -e unc_c_llc_lookup.any sleep 1

  Performance counter stats for 'system wide':

         2,685,120 Bytes unc_c_llc_lookup.any

       1.002648032 seconds time elapsed

v2: Split collect_aliases. Rename alias flag.
v3: Make sure unsupported/not counted is always printed.
v4: Factor out callback change into separate patch.
v5: Move check for bad results here
    Move merged check into collect_data
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

430daf2d

perf stat: Factor out callback for collecting event values · fbe51fba

由 Andi Kleen 提交于 3月 20, 2017

To be used in next patch to support automatic summing of alias events.

v2: Move check for bad results to next patch
v3: Remove trivial addition.
v4: Use perf_evsel__cpus instead of evsel->cpus
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

fbe51fba

04 3月, 2017 2 次提交

perf tools: Force uncore events to system wide monitoring · e3ba76de

由 Jiri Olsa 提交于 2月 27, 2017

Make system wide (-a) the default option if no target was specified and
one of following conditions is met:

  - there's no workload specified (current behaviour)
  - there is workload specified but all requested
    events are system wide ones

Mixed events core/uncore with workload:

  $ perf stat -e 'uncore_cbox_0/clockticks/,cycles' sleep 1

   Performance counter stats for 'sleep 1':

     <not supported>      uncore_cbox_0/clockticks/
             980,489      cycles

         1.000897406 seconds time elapsed

Uncore event with workload:

  $ perf stat -e 'uncore_cbox_0/clockticks/' sleep 1

   Performance counter stats for 'system wide':

  281,473,897,192,670      uncore_cbox_0/clockticks/

         1.000833784 seconds time elapsed

Committer note:

When testing I realized the default case for !root, i.e. no events
passed via -e, was broke by v2 of this patch, reported and after a
patch provided by Jiri it is back working:

  [acme@jouet linux]$ perf stat usleep 1

   Performance counter stats for 'usleep 1':

         0.401335      task-clock:u (msec)     #   0.297 CPUs utilized
                0      context-switches:u      #   0.000 K/sec
                0      cpu-migrations:u        #   0.000 K/sec
               48      page-faults:u           #   0.120 M/sec
          458,146      cycles:u                #   1.142 GHz
          245,113      instructions:u          #   0.54  insn per cycle
           47,991      branches:u              # 119.578 M/sec
            4,022      branch-misses:u         #   8.38% of all branches

      0.001350029 seconds time elapsed

  [acme@jouet linux]$
Suggested-and-Tested-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170227094818.GA12764@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e3ba76de

perf stat: Issue a HW watchdog disable hint · 02d492e5

由 Borislav Petkov 提交于 2月 07, 2017

When using perf stat on an AMD F15h system with the default hw events
attributes, some of the events don't get counted:

 Performance counter stats for 'sleep 1':

          0.749208      task-clock (msec)         #    0.001 CPUs utilized
                 1      context-switches          #    0.001 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                54      page-faults               #    0.072 M/sec
         1,122,815      cycles                    #    1.499 GHz
           286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle
     <not counted>      stalled-cycles-backend                                        (0.00%)
     ^^^^^^^^^^^^
     <not counted>      instructions                                                  (0.00%)
     ^^^^^^^^^^^^
     <not counted>      branches                                                      (0.00%)
     <not counted>      branch-misses                                                 (0.00%)

       1.001550070 seconds time elapsed

The reason is that we have the HW watchdog consuming one PMU counter and
when perf tries to schedule 6 events on 6 counters and some of those
counters are constrained to only a specific subset of PMCs by the
hardware, the event scheduling fails.

So issue a hint to disable the HW watchdog around a perf stat session.

Committer note:

Testing it...

  # perf stat -d usleep 1

   Performance counter stats for 'usleep 1':

          1.180203      task-clock (msec)         #    0.490 CPUs utilized
                 1      context-switches          #    0.847 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                54      page-faults               #    0.046 M/sec
           184,754      cycles                    #    0.157 GHz
           714,553      instructions              #    3.87  insn per cycle
           154,661      branches                  #  131.046 M/sec
             7,247      branch-misses             #    4.69% of all branches
           219,984      L1-dcache-loads           #  186.395 M/sec
            17,600      L1-dcache-load-misses     #    8.00% of all L1-dcache hits    (90.16%)
     <not counted>      LLC-loads                                                     (0.00%)
     <not counted>      LLC-load-misses                                               (0.00%)

       0.002406823 seconds time elapsed

  Some events weren't counted. Try disabling the NMI watchdog:
	echo 0 > /proc/sys/kernel/nmi_watchdog
	perf stat ...
	echo 1 > /proc/sys/kernel/nmi_watchdog
  #
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NIngo Molnar <mingo@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnicSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

02d492e5

20 2月, 2017 1 次提交

perf utils: Check verbose flag properly · bb963e16

由 Namhyung Kim 提交于 2月 17, 2017

It now can have negative value to suppress the message entirely.  So it
needs to check it being positive.
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170217081742.17417-3-namhyung@kernel.org
[ Adjust fuzz on tools/perf/util/pmu.c, add > 0 checks in many other places ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

bb963e16

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功