- 06 12月, 2017 10 次提交
-
-
由 Wang Nan 提交于
'perf record' can switch its output data file. The new output should only store the data after switching. However, in overwrite backward mode, the new output still can have data from before switching. That also brings extra overhead. At the end of mmap_read(), the position of the processed ring buffer is saved in md->prev. Next mmap_read should be end in md->prev if it is not overwriten. That avoids processing duplicate data. However, md->prev is discarded. So next the mmap_read() has to process whole valid ring buffer, which probably includes old processed data. Avoid calling backward_rb_find_range() when md->prev is still available. Signed-off-by: NWang Nan <wangnan0@huawei.com> Tested-by: NKan Liang <kan.liang@intel.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mengting Zhang <zhangmengting@huawei.com> Link: http://lkml.kernel.org/r/20171204165107.95327-3-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Wang Nan 提交于
'perf record' backward recording doesn't work as we expected: it never overwrites when ring buffer gets full. Test: Run a busy python printing task background like this: while True: print 123 send SIGUSR2 to perf to capture snapshot, then: # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit --exclude-perf -a --switch-output [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2017110101520743 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2017110101521251 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2017110101521692 ] ^C[ perf record: Woken up 1 times to write data ] [ perf record: Dump perf.data.2017110101521936 ] [ perf record: Captured and wrote 0.826 MB perf.data.<timestamp> ] # ./perf script -i ./perf.data.2017110101520743 | head -n3 perf 2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 2400, 0, 59, 100, 0) perf 2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 (4112340, 2, ffffffff, 3df, 100, 0) python 2545 [000] 12449.310800: raw_syscalls:sys_exit: NR 1 = 4 # ./perf script -i ./perf.data.2017110101521251 | head -n3 perf 2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 2400, 0, 59, 100, 0) perf 2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 (4112340, 2, ffffffff, 3df, 100, 0) python 2545 [000] 12449.310800: raw_syscalls:sys_exit: NR 1 = 4 # ./perf script -i ./perf.data.2017110101521692 | head -n3 perf 2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 2400, 0, 59, 100, 0) perf 2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 (4112340, 2, ffffffff, 3df, 100, 0) python 2545 [000] 12449.310800: raw_syscalls:sys_exit: NR 1 = 4 Timestamps never change, but my background task is a dead loop, can easily overwhelm the ring buffer. This patch fixes it by forcing unsetting PROT_WRITE for a backward ring buffer, so all backward ring buffers become overwrite ring buffers. Test result: # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit --exclude-perf -a --switch-output [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2017110101285323 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2017110101290053 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2017110101290446 ] ^C[ perf record: Woken up 1 times to write data ] [ perf record: Dump perf.data.2017110101290837 ] [ perf record: Captured and wrote 0.826 MB perf.data.<timestamp> ] # ./perf script -i ./perf.data.2017110101285323 | head -n3 python 2545 [000] 11064.268083: raw_syscalls:sys_exit: NR 1 = 4 python 2545 [000] 11064.268084: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0) python 2545 [000] 11064.268086: raw_syscalls:sys_exit: NR 1 = 4 # ./perf script -i ./perf.data.2017110101290 | head -n3 failed to open ./perf.data.2017110101290: No such file or directory # ./perf script -i ./perf.data.2017110101290053 | head -n3 python 2545 [000] 11071.564062: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0) python 2545 [000] 11071.564064: raw_syscalls:sys_exit: NR 1 = 4 python 2545 [000] 11071.564066: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0) # ./perf script -i ./perf.data.2017110101290 | head -n3 perf.data.2017110101290053 perf.data.2017110101290446 perf.data.2017110101290837 # ./perf script -i ./perf.data.2017110101290446 | head -n3 sshd 1321 [000] 11075.499473: raw_syscalls:sys_exit: NR 14 = 0 sshd 1321 [000] 11075.499474: raw_syscalls:sys_enter: NR 14 (2, 7ffe98899490, 0, 8, 0, 3000) sshd 1321 [000] 11075.499474: raw_syscalls:sys_exit: NR 14 = 0 # ./perf script -i ./perf.data.2017110101290837 | head -n3 python 2545 [000] 11079.280844: raw_syscalls:sys_exit: NR 1 = 4 python 2545 [000] 11079.280847: raw_syscalls:sys_enter: NR 1 (1, 12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0) python 2545 [000] 11079.280850: raw_syscalls:sys_exit: NR 1 = 4 Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Mengting Zhang <zhangmengting@huawei.com> Link: http://lkml.kernel.org/r/20171204165107.95327-2-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 William Cohen 提交于
The powerpc cpuid information includes chip revision information. Changes between chip revisions are usually minor bug fixes and usually do not affect the operation of the performance monitoring hardware. The original mapfile.csv matching requires enumerating every possible cpuid string. When a new minor chip revision is produced a new entry has to be added to the mapfile.csv and the code recompiled to allow perf to have the implementation specific perf events for this new minor revision. For users of various distibutions of Linux having to wait for a new release of the kernel's perf tool to be built with these trivial patches is inconvenient. Using regular expressions rather than exactly string matching of the entire cpuid string allows developers to write mapfile.csv files that do not require patches and recompiles for each of these minor version changes. If special cases need to be made for some particular versions, they can be placed earlier in the mapfile.csv file before the more general matches. Signed-off-by: NWilliam Cohen <wcohen@redhat.com> Tested-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shriya <shriyak@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20171204145728.16792-1-wcohen@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Wang Nan 提交于
All perf_mmap__read_forward() read from read-write ring buffer, so no need check_messup. Reading from backward ring buffer doesn't require check_messup because it never mess up. Cleanup arguments lists. Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/20171203020044.81680-6-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Wang Nan 提交于
'overwrite' argument is always 'false'. Remove it from arguments list of perf_mmap__push(). Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/20171203020044.81680-5-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Wang Nan 提交于
evlist->overwrite is set to false in all users. It can be removed. Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/20171203020044.81680-4-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Wang Nan 提交于
All users of perf_evlist__mmap_ex set !overwrite. Remove it from its arguments list. Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/20171203020044.81680-3-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Wang Nan 提交于
Now all perf_evlist__mmap's users doesn't set 'overwrite'. Remove it from arguments list. Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/20171203020044.81680-2-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Ganapatrao Kulkarni 提交于
On some platforms(arm/arm64) which uses cpus map to get corresponding cpuid string, cpuid can be NULL for PMUs other than CORE PMUs. Adding check for NULL cpuid in function perf_pmu__find_map to avoid segmentation fault. Signed-off-by: NGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ganapatrao Kulkarni <gklkml16@gmail.com> Cc: Jayachandran C <jnair@caviumnetworks.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@cavium.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20171016183222.25750-6-ganapatrao.kulkarni@cavium.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Ganapatrao Kulkarni 提交于
On some platforms, PMU core devices sysfs name is not cpu. Adding function is_pmu_core to detect PMU core devices using core device specific hints in sysfs. For arm64 platforms, all core devices have file "cpus" in sysfs. Signed-off-by: NGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Tested-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Tested-by: NJin Yao <yao.jin@linux.intel.com> Acked-by: NWill Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/n/tip-y1woxt1k2pqqwpprhonnft2s@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 05 12月, 2017 5 次提交
-
-
由 Ganapatrao Kulkarni 提交于
The cpuid string will not be same on all CPUs on heterogeneous platforms like ARM's big.LITTLE, adding provision(using pmu->cpus) to find cpuid string from associated CPUs of PMU CORE device. Also optimise arguments to function pmu_add_cpu_aliases. Signed-off-by: NGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Acked-by: NWill Deacon <will.deacon@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jayachandran C <jnair@caviumnetworks.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@cavium.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: http://lkml.kernel.org/r/20171016183222.25750-2-ganapatrao.kulkarni@cavium.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
Reusing the thread_map__new_by_uid() proc scanning already in place to return a map with all threads in the system. Based-on-a-patch-by: NJin Yao <yao.jin@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/n/tip-khh28q0wwqbqtrk32bfe07hd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jin Yao 提交于
In current stat-shadow.c, the rbtree deleting is ignored. The patch adds the implementation to node_delete method of rblist. Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512125856-22056-5-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jin Yao 提交于
Currently we have a rblist__delete() which is used to delete a rblist. While rblist__delete() will free the pointer of rblist at the end. It's an inconvenience for the user to delete a rblist which is not allocated by something like malloc(). For example, the rblist is embedded in a larger data structure. This patch creates a new function rblist__exit() which is similar to rblist__delete() but it will not free the pointer of rblist. Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512125856-22056-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Thomas Richter 提交于
The command 'perf annotate' parses the output of objdump and also investigates the comments produced by objdump. For example the output of objdump produces (on x86): 23eee: 4c 8b 3d 13 01 21 00 mov 0x210113(%rip),%r15 # 234008 <stderr@@GLIBC_2.2.5+0x9a8> and the function mov__parse() is called to investigate the complete line. Mov__parse() breaks this line into several parts and finally calls function comment__symbol() to parse the data after the comment character '#'. Comment__symbol() expects a hexadecimal address followed by a symbol in '<' and '>' brackets. However the 2nd parameter given to function comment__symbol() always points to the comment character '#'. The address parsing always returns 0 because the character '#' is not a digit and strtoull() fails without being noticed. Fix this by advancing the second parameter to function comment__symbol() by one byte before invocation and add an error check after strtoull() has been called. Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Fixes: 6de783b6 ("perf annotate: Resolve symbols using objdump comment") Link: http://lkml.kernel.org/r/20171128075632.72182-1-tmricht@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 30 11月, 2017 4 次提交
-
-
由 Adrian Hunter 提交于
Print file names of files that differ. For example, instead of: Warning: Intel PT: x86 instruction decoder differs from kernel print: Warning: Intel PT: x86 instruction decoder header at 'tools/perf/util/intel-pt-decoder/inat.h' differs from latest version at 'arch/x86/include/asm/inat.h' Reported-by: NIngo Molnar <mingo@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1511253326-22308-2-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
The PERF_RECORD_USER_ events are synthesized by the tool to assist in processing the PERF_RECORD_ ones generated by the kernel, the printing of that information doesn't come with a perf_sample structure, so, when dumping the event fields using 'perf report -D' there were columns that end up not being printed. To tidy up a bit this, fake a perf_sample structure with zeroes to have the missing columns printed and avoid the occasional surprise with that. Before: 0 0x45b8 [0x68]: PERF_RECORD_MMAP -1/0: [0xffffffffc12ec000(0x4000) @ 0]: x /lib/modules/4.14.0+/kernel/fs/nls/nls_utf8.ko 0x4620 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27820 0x4648 [0x18]: PERF_RECORD_CPU_MAP: 0-3 0 0x4660 [0x28]: PERF_RECORD_COMM: perf:27820/27820 0x4a58 [0x8]: PERF_RECORD_FINISHED_ROUND 447723433020976 0x4688 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 27820/27820: 0xffffffff8f1b6d7a period: 1 addr: 0 After: $ perf report -D | grep PERF_RECORD_ | head 0 0xe8 [0x20]: PERF_RECORD_TIME_CONV: unhandled! 0 0x108 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 32555 0 0x130 [0x18]: PERF_RECORD_CPU_MAP: 0-3 0 0x148 [0x28]: PERF_RECORD_COMM: perf:32555/32555 0 0x4e8 [0x8]: PERF_RECORD_FINISHED_ROUND 448743409421205 0x170 [0x28]: PERF_RECORD_COMM exec: sleep:32555/32555 448743409431883 0x198 [0x68]: PERF_RECORD_MMAP2 32555/32555: [0x55e11d75a000(0x208000) @ 0 fd:00 3147174 2566255743]: r-xp /usr/bin/sleep 448743409443873 0x200 [0x70]: PERF_RECORD_MMAP2 32555/32555: [0x7f0ced316000(0x229000) @ 0 fd:00 3151761 2566238119]: r-xp /usr/lib64/ld-2.25.so 448743409454790 0x270 [0x60]: PERF_RECORD_MMAP2 32555/32555: [0x7ffe84f6d000(0x2000) @ 0 00:00 0 0]: r-xp [vdso] 448743409479500 0x2d0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4002): 32555/32555: 0xffffffff8f84c7e7 period: 1 addr: 0 $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 9aefcab0 ("perf session: Consolidate the dump code") Link: https://lkml.kernel.org/n/tip-todcu15x0cwgppkh1gi6uhru@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Add support for computing 'perf stat' style metrics in 'perf script'. When using leader sampling we can get metrics for each sampling period by computing formulas over the values of the different group members. This allows things like fine grained IPC tracking through sampling, much more fine grained than with 'perf stat'. The metric is still averaged over the sampling period, it is not just for the sampling point. This patch adds a new metric output field for 'perf script' that uses the existing 'perf stat' metrics infrastructure to compute any metrics supported by 'perf stat'. For example to sample IPC: $ perf record -e '{ref-cycles,cycles,instructions}:S' -a sleep 1 $ perf script -F metric,ip,sym,time,cpu,comm ... alsa-sink-ALC32 [000] 42815.856074: 7fd65937d6cc [unknown] alsa-sink-ALC32 [000] 42815.856074: 7fd65937d6cc [unknown] alsa-sink-ALC32 [000] 42815.856074: 7fd65937d6cc [unknown] alsa-sink-ALC32 [000] 42815.856074: metric: 0.13 insn per cycle swapper [000] 42815.857961: ffffffff81655df0 __schedule swapper [000] 42815.857961: ffffffff81655df0 __schedule swapper [000] 42815.857961: ffffffff81655df0 __schedule swapper [000] 42815.857961: metric: 0.23 insn per cycle qemu-system-x86 [000] 42815.858130: ffffffff8165ad0e _raw_spin_unlock_irqrestore qemu-system-x86 [000] 42815.858130: ffffffff8165ad0e _raw_spin_unlock_irqrestore qemu-system-x86 [000] 42815.858130: ffffffff8165ad0e _raw_spin_unlock_irqrestore qemu-system-x86 [000] 42815.858130: metric: 0.46 insn per cycle :4972 [000] 42815.858312: ffffffffa080e5f2 vmx_vcpu_run :4972 [000] 42815.858312: ffffffffa080e5f2 vmx_vcpu_run :4972 [000] 42815.858312: ffffffffa080e5f2 vmx_vcpu_run :4972 [000] 42815.858312: metric: 0.45 insn per cycle TopDown: This requires disabling SMT if you have it enabled, because SMT would require sampling per core, which is not supported. $ perf record -e '{ref-cycles,topdown-fetch-bubbles,\ topdown-recovery-bubbles,\ topdown-slots-retired,topdown-total-slots,\ topdown-slots-issued}:S' -a sleep 1 $ perf script --header -I -F cpu,ip,sym,event,metric,period ... [000] 121108 ref-cycles: ffffffff8165222e copy_user_enhanced_fast_string [000] 190350 topdown-fetch-bubbles: ffffffff8165222e copy_user_enhanced_fast_string [000] 2055 topdown-recovery-bubbles: ffffffff8165222e copy_user_enhanced_fast_string [000] 148729 topdown-slots-retired: ffffffff8165222e copy_user_enhanced_fast_string [000] 144324 topdown-total-slots: ffffffff8165222e copy_user_enhanced_fast_string [000] 160852 topdown-slots-issued: ffffffff8165222e copy_user_enhanced_fast_string [000] metric: 33.0% frontend bound [000] metric: 3.5% bad speculation [000] metric: 25.8% retiring [000] metric: 37.7% backend bound [000] 112112 ref-cycles: ffffffff8165aec8 _raw_spin_lock_irqsave [000] 357222 topdown-fetch-bubbles: ffffffff8165aec8 _raw_spin_lock_irqsave [000] 3325 topdown-recovery-bubbles: ffffffff8165aec8 _raw_spin_lock_irqsave [000] 323553 topdown-slots-retired: ffffffff8165aec8 _raw_spin_lock_irqsave [000] 270507 topdown-total-slots: ffffffff8165aec8 _raw_spin_lock_irqsave [000] 341226 topdown-slots-issued: ffffffff8165aec8 _raw_spin_lock_irqsave [000] metric: 33.0% frontend bound [000] metric: 2.9% bad speculation [000] metric: 29.9% retiring [000] metric: 34.2% backend bound ... v2: Use evsel->priv for new fields Port to new base line, support fp output. Handle stats in ->stats, not ->priv Minor cleanups Extra explanation about the use of the term 'averaging', from Andi in the thread in the Link: tag below: <quote Andi> The current samples contains the sum of event counts for a sampling period. EventA-1 EventA-2 EventA-3 EventA-4 EventB-1 EventB-2 EventC-3 gap with no events overflow |-----------------------------------------------------------------| period-start period-end ^ ^ | | previous sample current sample So EventA = 4 and EventB = 3 at the sample point I generate a metric, let's say EventA / EventB. It applies to the whole period. But the metric is over a longer time which does not have the same behavior. For example the gap above doesn't have any events, while they are clustered at the beginning and end of the sample period. But we're summing everything together. The metric doesn't know that the gap is different than the busy period. That's what I'm trying to express with averaging. </quote> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20171117214300.32746-4-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Move the code to synthesize event updates for scale/unit/cpus to a common utility file, and use it both from stat and record. This allows to access scale and other extra qualifiers from perf script. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20171117214300.32746-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 11月, 2017 7 次提交
-
-
由 Adrian Hunter 提交于
There are just a few new defines which do not affect perf tools. Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1511253326-22308-3-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
The warning about kptr_restrict needs to be emitted only when it is set and we ask for kernel space samples, so add a helper to help with that. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-fh7drty6yljei9gxxzer6eup@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Ravi Bangoria 提交于
There are many instructions, esp on PowerPC, whose mnemonics are longer than 6 characters. Using precision limit causes truncation of such mnemonics. Fix this by removing precision limit. Note that, 'width' is still 6, so alignment won't get affected for length <= 6. Before: li r11,-1 xscvdp vs1,vs1 add. r10,r10,r11 After: li r11,-1 xscvdpsxds vs1,vs1 add. r10,r10,r11 Reported-by: NDonald Stence <dstence@us.ibm.com> Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/20171114032540.4564-1-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
A recent fix for 'perf trace' introduced a bug where machine__exit(trace->host) could be called while trace->host was still NULL, so make this more robust by guarding against NULL, just like free() does. The problem happens, for instance, when !root users try to run 'perf trace': [acme@jouet linux]$ trace Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' perf: Segmentation fault Obtained 7 stack frames. [0x4f1b2e] /lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f] [0x4f3fec] [0x47468b] [0x42a2db] /lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509] [0x42a6c9] Segmentation fault (core dumped) [acme@jouet linux]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 33974a41 ("perf trace: Call machine__exit() at exit") Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
I forgot one conversion, which got noticed by Thomas when running: $ perf stat -e '{cpu-clock,instructions}' kill kill: not enough arguments Segmentation fault (core dumped) $ Fix it, those stats are in evsel->stats, not anymore in evsel->priv. Reported-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com> Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: e669e833 ("perf evsel: Restore evsel->priv as a tool private area") Link: http://lkml.kernel.org/r/20171109150046.GN4333@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
The Intel PMU event aliases have a implicit period= specifier to set the default period. Unfortunately this breaks overriding these periods with -c or -F, because the alias terms look like they are user specified to the internal parser, and user specified event qualifiers override the command line options. Track that they are coming from aliases by adding a "weak" state to the term. Any weak terms don't override command line options. I only did it for -c/-F for now, I think that's the only case that's broken currently. Before: $ perf record -c 1000 -vv -e uops_issued.any ... { sample_period, sample_freq } 2000003 After: $ perf record -c 1000 -vv -e uops_issued.any ... { sample_period, sample_freq } 1000 Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20171020202755.21410-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
The evsel->idx field is used mainly to access the right bucket in per-event arrays such as the annotation ones, but also to set evsel->tracking, that in turn will decide what of the events will ask for PERF_RECORD_{MMAP,COMM,EXEC} to be generated, i.e. which perf_event_attr will have its mmap, etc fields set. When we were adding the "dummy" event using perf_evlist__add_dummy() we were not setting it correctly, which could result in multiple tracking events. Now that I'll try using a dummy event to be the tracking one when using 'perf record --delay', i.e. when we process the --delay setting we may already have the evlist set up, like with: perf record -e cycles,instructions --delay 1000 ./workload We will need to add a "dummy" event, then reset evsel->tracking for the first event, "cycles", and set it instead to the dummy one, and also setting its attr.enable_on_exec, so that we get the PERF_RECORD_MMAP, etc metadata events while waiting to enable the explicitely requested events, so lets get this straight and set the right evsel->idx. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bram Stolk <b.stolk@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 17 11月, 2017 14 次提交
-
-
由 Jiri Olsa 提交于
We need to call symbol__calc_percent() periodicaly for top, so it's no longer convenient to keep it in symbol__disassemble(). Let's separate the symbol__disassemble() to allocate and init the symbol annotation structs and symbol__calc_percent() to compute the lines percentages based on symbol hists data. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-gtnp8t4tb00q6lag07psn5nq@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
There's no need for symbol__calc_percent and annotation__calc_percent functions to return any value, since it's always zero. Changing both function to return void. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-z0gs28hh24m4gia1t1ctraye@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
Currently when using ordered events we parse the sample twice (the perf_evlist__parse_sample function). Once before we queue the sample for sorting: perf_session__process_event perf_evlist__parse_sample(sample) perf_session__queue_event(sample.time) And then when we deliver the sorted sample: ordered_events__deliver_event perf_evlist__parse_sample perf_session__deliver_event We can skip the initial full sample parsing by using perf_evlist__parse_sample_timestamp function, which got introduced earlier. The new path looks like: perf_session__process_event perf_evlist__parse_sample_timestamp perf_session__queue_event ordered_events__deliver_event perf_session__deliver_event perf_evlist__parse_sample It saves some instructions and is slightly faster: Before: Performance counter stats for './perf.old report --stdio' (5 runs): 64,396,007,225 cycles:u ( +- 0.97% ) 105,882,112,735 instructions:u # 1.64 insn per cycle ( +- 0.00% ) 21.618103465 seconds time elapsed ( +- 1.12% ) After: Performance counter stats for './perf report --stdio' (5 runs): 60,567,807,182 cycles:u ( +- 0.40% ) 104,853,333,514 instructions:u # 1.73 insn per cycle ( +- 0.00% ) 20.168895243 seconds time elapsed ( +- 0.32% ) Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-cjp2tuk0qkjs9dxzlpmm34ua@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
There's no need to pass whole sample data, because it's only timestamp that is used. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-xd1hpoze3kgb1rb639o3vehb@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
Add perf_evlist__parse_sample_timestamp to retrieve the timestamp of the sample. The idea is to use this function instead of the full sample parsing before we queue the sample. At that time only the timestamp is needed and we parse the sample once again later on delivery. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-o7syqo8lipj4or7renpu8e8y@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
Move the initialization bits into common place at the beginning of the function. Also removing some superfluous zero initialization for addr and transaction, because we zero all the data at the top. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1gv5t6fvv735t1rt3mxpy1h9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
We already pass cursor into thread__resolve_callchain function, so there's no point in resetting the global instance. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-puk015qvuppao9m1xtdy9v7j@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kim Phillips 提交于
Help identify to the user the event with the unsupported sampling error. Also suggest a corrective action. BEFORE: $ sudo ./oldperf record -e armv8_pmuv3/mem_access/,ccn/cycles/,armv8_pmuv3/l2d_cache/ true Error: PMU Hardware doesn't support sampling/overflow-interrupts. AFTER: $ sudo ./newperf record -e armv8_pmuv3/mem_access/,ccn/cycles/,armv8_pmuv3/l2d_cache/ true Error: ccn/cycles/: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Signed-off-by: NKim Phillips <kim.phillips@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171114150452.e846f2e23684c7d7d8ee706f@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
The warning about kptr_restrict needs to be emitted only when it is set and we ask for kernel space samples, so add a helper to help with that. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-fh7drty6yljei9gxxzer6eup@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Ravi Bangoria 提交于
There are many instructions, esp on PowerPC, whose mnemonics are longer than 6 characters. Using precision limit causes truncation of such mnemonics. Fix this by removing precision limit. Note that, 'width' is still 6, so alignment won't get affected for length <= 6. Before: li r11,-1 xscvdp vs1,vs1 add. r10,r10,r11 After: li r11,-1 xscvdpsxds vs1,vs1 add. r10,r10,r11 Reported-by: NDonald Stence <dstence@us.ibm.com> Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/20171114032540.4564-1-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
A recent fix for 'perf trace' introduced a bug where machine__exit(trace->host) could be called while trace->host was still NULL, so make this more robust by guarding against NULL, just like free() does. The problem happens, for instance, when !root users try to run 'perf trace': [acme@jouet linux]$ trace Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' perf: Segmentation fault Obtained 7 stack frames. [0x4f1b2e] /lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f] [0x4f3fec] [0x47468b] [0x42a2db] /lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509] [0x42a6c9] Segmentation fault (core dumped) [acme@jouet linux]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 33974a41 ("perf trace: Call machine__exit() at exit") Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
I forgot one conversion, which got noticed by Thomas when running: $ perf stat -e '{cpu-clock,instructions}' kill kill: not enough arguments Segmentation fault (core dumped) $ Fix it, those stats are in evsel->stats, not anymore in evsel->priv. Reported-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com> Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: e669e833 ("perf evsel: Restore evsel->priv as a tool private area") Link: http://lkml.kernel.org/r/20171109150046.GN4333@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Use a typed enum for the perf_evsel_config_term type enum. This allows gcc to do much stronger type checks, and also check for missing case statements. I removed the unused _MAX member from the number. It found one missing case. I'm not sure it's a real problem, so I just turned it into a BUG_ON for now. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20171020202755.21410-1-andi@firstfloor.org [ Renamed the enum name to term_type as per jolsa's request ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
The Intel PMU event aliases have a implicit period= specifier to set the default period. Unfortunately this breaks overriding these periods with -c or -F, because the alias terms look like they are user specified to the internal parser, and user specified event qualifiers override the command line options. Track that they are coming from aliases by adding a "weak" state to the term. Any weak terms don't override command line options. I only did it for -c/-F for now, I think that's the only case that's broken currently. Before: $ perf record -c 1000 -vv -e uops_issued.any ... { sample_period, sample_freq } 2000003 After: $ perf record -c 1000 -vv -e uops_issued.any ... { sample_period, sample_freq } 1000 Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20171020202755.21410-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-