- 18 10月, 2017 1 次提交
-
-
由 Jin Yao 提交于
In current xyarray code, xyarray__max_x() returns max_y, and xyarray__max_y() returns max_x. It's confusing and for code logic it looks not correct. Error happens when closing evsel fd. Let's see this scenario: 1. Allocate an fd (pseudo-code) perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads) { evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int)); } xyarray__new(int xlen, int ylen, size_t entry_size) { size_t row_size = ylen * entry_size; struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size); xy->entry_size = entry_size; xy->row_size = row_size; xy->entries = xlen * ylen; xy->max_x = xlen; xy->max_y = ylen; ...... } So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4. 2. Use perf syscall and get the fd int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, struct thread_map *threads) { for (cpu = 0; cpu < cpus->nr; cpu++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu], group_fd, flags); FD(evsel, cpu, thread) = fd; } } static inline void *xyarray__entry(struct xyarray *xy, int x, int y) { return &xy->contents[x * xy->row_size + y * xy->entry_size]; } These codes don't have issues. The issue happens in the closing of fd. 3. Close fd. void perf_evsel__close_fd(struct perf_evsel *evsel) { int cpu, thread; for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } } Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y() returns max_x (ncpus), so above code is actually to be: for (cpu = 0; cpu < nthreads; cpu++) for (thread = 0; thread < ncpus; ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } It's not correct! This change is introduced by "475fb533" ("perf evsel: Fix buffer overflow while freeing events") This fix is to let xyarray__max_x() return max_x (ncpus) and let xyarry__max_y() return max_y (nthreads) Committer note: This was also fixed by Ravi Bangoria, who provided the same patch, noticing the problem with 'perf record': <quote Ravi> I see 'perf record -p <pid>' crashes with following log: *** Error in `./perf': free(): invalid next size (normal): 0x000000000298b340 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7fd85c87e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f7fd85d137a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7fd85d553c] ./perf(perf_evsel__close+0xb4)[0x4b7614] ./perf(perf_evlist__delete+0x100)[0x4ab180] ./perf(cmd_record+0x1d9)[0x43a5a9] ./perf[0x49aa2f] ./perf(main+0x631)[0x427841] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7fd8571830] ./perf(_start+0x29)[0x427a59] </> Signed-off-by: NJin Yao <yao.jin@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Fixes: d74be476 ("perf xyarray: Save max_x, max_y") Link: http://lkml.kernel.org/r/1508339478-26674-1-git-send-email-yao.jin@linux.intel.com Link: http://lkml.kernel.org/r/1508327446-15302-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 17 10月, 2017 1 次提交
-
-
由 Namhyung Kim 提交于
Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL pointer deref when he ran it on a data with namespace events. It was because the buildid_id__mark_dso_hit_ops lacks the namespace event handler and perf_too__fill_default() didn't set it. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x (gdb) where #0 0x0000000000000000 in ?? () #1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18, evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888, tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287 #2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>, sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340 #3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470, file_offset=file_offset@entry=1136) at util/session.c:1522 #4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>, data_offset=<optimized out>, session=0x0) at util/session.c:1899 #5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953 #6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>) at builtin-buildid-list.c:83 #7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115 #8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0) at perf.c:296 #9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348 #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392 #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536 (gdb) Fix it by adding a stub event handler for namespace event. Committer testing: Further clarifying, plain using 'perf buildid-list' will not end up in a SEGFAULT when processing a perf.data file with namespace info: # perf record -a --namespaces sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ] # perf buildid-list | wc -l 38 # perf buildid-list | head -5 e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux 874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko 5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so # It is only when one asks for checking what of those entries actually had samples, i.e. when we use either -H or --with-hits, that we will process all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c neither explicitely set a perf_tool.namespaces() callback nor the default stub was set that we end up, when processing a PERF_RECORD_NAMESPACE record, causing a SEGFAULT: # perf buildid-list -H Segmentation fault (core dumped) ^C # Reported-and-Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Fixes: f3b3614a ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 14 10月, 2017 1 次提交
-
-
由 Jiri Olsa 提交于
Adding the check wether the eBPF file exists, to consider it as eBPF input file. This way we can differentiate eBPF events from events that end up with same suffix as eBPF file. Before: $ perf stat -e 'cpu/uops_executed.core/' true bpf: builtin compilation failed: -95, try external compiler WARNING: unable to get correct kernel building directory. Hint: Set correct kbuild directory using 'kbuild-dir' option in [llvm] section of ~/.perfconfig or set it to "" to suppress kbuild detection. event syntax error: 'cpu/uops_executed.core/' \___ Failed to load cpu/uops_executed.c from source: 'version' section incorrect or lost After: $ perf stat -e 'cpu/uops_executed.core/' true Performance counter stats for 'true': 181,533 cpu/uops_executed.core/:u 0.002795447 seconds time elapsed If user makes type in the eBPF file, we prioritize the event syntax and show following warning: $ perf stat -e 'krava.c//' true event syntax error: 'krava.c//' \___ Cannot find PMU `krava.c'. Missing kernel support? Reported-and-Tested-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-9-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 10 10月, 2017 1 次提交
-
-
由 Mark Rutland 提交于
Currently, perf record is broken on arm/arm64 systems when the PMU is specified explicitly as part of the event, e.g. $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true In such cases, perf record fails to open events unless perf_event_paranoid is set to -1, even if the PMU in question supports mode exclusion. Further, even when perf_event_paranoid is toggled, no samples are recorded. This is an unintended side effect of commit: e3ba76de ("perf tools: Force uncore events to system wide monitoring) ... which assumes that if a PMU has an associated cpu_map, it is an uncore PMU, and forces events for such PMUs to be system-wide. This is not true for arm/arm64 systems, which can have heterogeneous CPUs. To account for this, multiple CPU PMUs are exposed, each with a "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM PMUs do not have a "cpumask" file, and only have a "cpus" file. For the gory details as to why, see commit: 7e3fcffe ("perf pmu: Support alternative sysfs cpumask") Given all of this, we can instead identify uncore PMUs by explicitly checking for a "cpumask" file, and restore arm/arm64 PMU support back to a working state. This patch does so, adding a new perf_pmu::is_uncore field, and splitting the existing cpumask parsing so that it can be reused. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Tested-by Will Deacon <will.deacon@arm.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: 4.12+ <stable@vger.kernel.org> Fixes: e3ba76de ("perf tools: Force uncore events to system wide monitoring) Link: http://lkml.kernel.org/r/1507315102-5942-1-git-send-email-mark.rutland@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 05 10月, 2017 1 次提交
-
-
由 Ravi Bangoria 提交于
Two functions from different binaries can have same start address. Thus, comparing only start address in match_chain() leads to inconsistent callchains. Fix this by adding a check for dsos as well. Ex, https://www.spinics.net/lists/linux-perf-users/msg04067.htmlReported-by: NAlexander Pozdneev <pozdneyev@gmail.com> Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/20171005091234.5874-1-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 03 10月, 2017 4 次提交
-
-
由 Kan Liang 提交于
Using UINT_MAX to indicate the default thread#, which is the max number of online CPU. Committer testing: # perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9 # cat /tmp/output ? ( ? ): ... [continued]: clone()) = 26651 (perf) 0.059 ( 0.010 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf) 0.116 ( 0.014 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf) 0.141 ( 0.009 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf) 0.160 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf) 0.232 ( 0.013 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf) 0.393 ( 0.011 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf) 0.802 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf) 1.411 ( 0.022 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf) 246.422 ( 0.042 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf) # Signed-off-by: NKan Liang <kan.liang@intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
The proc files which is sorted with alphabetical order are evenly assigned to several synthesize threads to be processed in parallel. For 'perf top', the threads number hard code to online CPU number. The following patch will introduce an option to set it. For other perf tools, the thread number is 1. Because the process function is not ready for multithreading, e.g. process_synthesized_event. This patch series only support event synthesize multithreading for 'perf top'. For other tools, it can be done separately later. With multithread applied, the total processing time can get up to 1.56x speedup on Knights Mill for 'perf top'. For specific single event processing, the processing time could increase because of the lock contention. So proc_map_timeout may need to be increased. Otherwise some proc maps will be truncated. Based on my test, increasing the proc_map_timeout has small impact on the total processing time. The total processing time still get 1.49x speedup on Knights Mill after increasing the proc_map_timeout. The patch itself doesn't increase the proc_map_timeout. Doesn't need to implement multithreading for per task monitoring, perf_event__synthesize_thread_map. It doesn't have performance issue. Committer testing: # getconf _NPROCESSORS_ONLN 4 # perf trace --no-inherit -e clone -o /tmp/output perf top # tail -4 /tmp/bla 0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf) 0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf) 0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf) 246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf) # Signed-off-by: NKan Liang <kan.liang@intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
Add comm_str_lock to protect comm_str rb tree. The lock is only needed for multithreaded code, so using mutex wrappers provided by perf tool. Signed-off-by: NKan Liang <kan.liang@intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
Add two locks to protect namespaces_list and comm_list. The lock is only needed for multithreaded code, so using mutex wrappers provided by perf tool. Not all the comm_list/namespaces_list accessing are protected, e.g. thread__exec_comm. Because the multithread code for perf top event synthesizing does not touch them. They don't need a lock. Signed-off-by: NKan Liang <kan.liang@intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-2-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 9月, 2017 1 次提交
-
-
由 Thomas Richter 提交于
On s390x perf test 1 failed. It turned out that commit 4a084ecf ("perf report: Fix module symbol adjustment for s390x") was incorrect. The previous implementation in dso__load_sym() is also suitable for s390x. Therefore this patch undoes commit 4a084ecf. Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com> Fixes: 4a084ecf ("perf report: Fix module symbol adjustment for s390x") LPU-Reference: 20170915071404.58398-1-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-5ani7ly57zji7s0hmzkx416l@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 25 9月, 2017 3 次提交
-
-
由 Akemi Yagi 提交于
The build of kernel v4.14-rc1 for i686 fails on RHEL 6 with the error in tools/perf: util/syscalltbl.c:157: error: expected ';', ',' or ')' before '__maybe_unused' mv: cannot stat `util/.syscalltbl.o.tmp': No such file or directory Fix it by placing/moving: #include <linux/compiler.h> outside of #ifdef HAVE_SYSCALL_TABLE block. Signed-off-by: NAkemi Yagi <toracat@elrepo.org> Cc: Alan Bartlett <ajb@elrepo.org> Link: http://lkml.kernel.org/r/oq41r8$1v9$1@blaine.gmane.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Mengting Zhang 提交于
With --call-graph option, perf report can display call chains using type, min percent threshold, optional print limit and order. And the default call-graph parameter is 'graph,0.5,caller,function,percent'. Before this patch, 'perf report --call-graph' shows incorrect debug messages as below: # perf report --call-graph Invalid callchain mode: 0.5 Invalid callchain order: 0.5 Invalid callchain sort key: 0.5 Invalid callchain config key: 0.5 Invalid callchain mode: caller Invalid callchain mode: function Invalid callchain order: function Invalid callchain mode: percent Invalid callchain order: percent Invalid callchain sort key: percent That is because in function __parse_callchain_report_opt(),each field of the call-graph parameter is passed to parse_callchain_{mode,order, sort_key,value} in turn until it meets the matching value. For example, the order field "caller" is passed to parse_callchain_mode() firstly and obviously it doesn't match any mode field. Therefore parse_callchain_mode() will shows the debug message "Invalid callchain mode: caller", which could confuse users. The patch fixes this issue by moving the warning out of the function parse_callchain_{mode,order,sort_key,value}. Signed-off-by: NMengting Zhang <zhangmengting@huawei.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1506154694-39691-1-git-send-email-zhangmengting@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
Yet another fix for probing the max attr.precise_ip setting: it is not enough settting attr.exclude_kernel for !root users, as they _can_ profile the kernel if the kernel.perf_event_paranoid sysctl is set to -1, so check that as well. Testing it: As non root: $ sysctl kernel.perf_event_paranoid kernel.perf_event_paranoid = 2 $ perf record sleep 1 $ perf evlist -v cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ... Now as non-root, but with kernel.perf_event_paranoid set set to the most permissive value, -1: $ sysctl kernel.perf_event_paranoid kernel.perf_event_paranoid = -1 $ perf record sleep 1 $ perf evlist -v cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ... $ I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel, non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel. In both cases, use the highest available precision: attr.precise_ip = 3. Reported-and-Tested-by: NIngo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: d37a3697 ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p") Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 22 9月, 2017 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
Andi reported a performance drop in single threaded perf tools such as 'perf script' due to the growing number of locks being put in place to allow for multithreaded tools, so wrap the POSIX threads rwlock routines with the names used for such kinds of locks in the Linux kernel and then allow for tools to ask for those locks to be used or not. I.e. a tool may have a multithreaded phase and then switch to single threaded, like the upcoming patches for the synthesizing of PERF_RECORD_{FORK,MMAP,etc} for pre-existing processes to then switch to single threaded mode in 'perf top'. The init routines will not be conditional, this way starting as single threaded to then move to multi threaded mode should be possible. Reported-by: NAndi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170404161739.GH12903@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
The -M metric group parser threw away the events of earlier groups when multiple groups were specified. Fix this here by not overwriting the string incorrectly. Now this works correctly: % perf stat -M Summary,SMT --metric-only -a sleep 1 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization SMT_2T_Utilization Kernel_Utilization CoreIPC CORE_CLKS 900907376.0 2.7 2398954144.0 0.1 0.0 0.2 0.2 0.1 0.4 2080822855.5 while previously it would only show the SMT metrics. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170914205735.18431-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 18 9月, 2017 3 次提交
-
-
由 Andi Kleen 提交于
When a PMU is missing print a better error message mentioning the missing PMU. % mkdir empty % mount --bind empty /sys/devices/msr % perf stat -M Summary true event syntax error: '{inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_comp_ops_exe.sse_scalar..' \___ Cannot find PMU `msr'. Missing kernel support? It still cannot find the right column for aliases, but it's already a vast improvement. v2: Check asprintf Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170913215006.32222-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
In some cases we already have calculated the hash bucket, so reuse it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-800zehjsyy03er4s4jf0e99v@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
To process any events, it needs to find the thread in the machine first. The machine maintains a rb tree to store all threads. The rb tree is protected by a rw lock. It is not a problem for current perf which serially processing events. However, it will have scalability performance issue to process events in parallel, especially on a heavy load system which have many threads. Introduce a hashtable to divide the big rb tree into many samll rb tree for threads. The index is thread id % hashtable size. It can reduce the lock contention. Committer notes: Renamed some variables and function names to reduce semantic confusion: 'struct threads' pointers: thread -> threads threads hastable index: tid -> hash_bucket struct threads *machine__thread() -> machine__threads() Cast tid to (unsigned int) to handle -1 in machine__threads() (Kan Liang) Signed-off-by: NKan Liang <kan.liang@intel.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1505096603-215017-2-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 13 9月, 2017 13 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
There are no usage outside util.c and this is the only remaining reason for fcntl.h to be included in util.h, to get the loff_t definition in Alpine Linux, so make it static. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2dzlsao7k6ihozs5karw6kpx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Taeung Song 提交于
When there isn't a config file (e.g. ~/.perfconfig) or it has nothing, the config set wasn't created. If the config set does not exist, a config file can't be autogenerated. So allow creating a empty config set in the above case, then we can support the config file autogeneration. Before: $ rm -f ~/.perfconfig $ perf config --user report.children=false $ cat ~/.perfconfig cat: /root/.perfconfig: No such file or directory But I think it should work even if there isn't a config file. After: $ rm -f ~/.perfconfig $ perf config --user report.children=false $ cat ~/.perfconfig # this file is auto-generated. [report] children = false NOTE: As a result, if perf_config_set__init() fails, it looks as if the config set isn't freed. But it isn't a problem. Because the config set will be freed by perf_config_set__delete() at the end of cmd_config(). Signed-off-by: NTaeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504754336-9824-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
In perf_event__synthesize_threads() perf goes through all proc files serially by readdir. scandir() does a snapshoot of /proc, which is multithreading friendly. It's possible that some threads which are added during event synthesize. But the number of lost threads should be small. They should not impact the final analysis. Signed-off-by: NKan Liang <kan.liang@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1504806954-150842-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Jiri Olsa 提交于
Adding the size values '[current/total]' into progress bar, to show more detailed progress of data reading. Adding new ui_progress__init_size function to specify we want to display the size. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908120510.22515-5-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
USER_REGS can currently only collected implicitely with call graph recording. Sometimes it is useful to see them separately, and filter them. Add a new --user-regs option to record that is similar to --intr-regs, but acts on user regs. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170905170029.19722-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Some of the metrics formulas (like GFLOPs) need to know how long the measurement period is. Support an internal event called duration_time, which reports time in second. It maps to the dummy event, but is special cased for statistics to report the walltime duration. So far it is not printed, but only used internally for metrics. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-10-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
We don't need to use ctx to look up events for saved values. The context is already part of the evsel pointer, which is the primary key. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-9-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Add code to perf list to print metric groups, and metrics that don't have an event name. The metricgroup code collects the eventgroups and events into a rblist, and then prints them according to the configured filters. The metricgroups are printed by default, but can be limited by perf list metric or perf list metricgroup % perf list metricgroup .. Metric Groups: DSB: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] FLOPS: GFLOPs [Giga Floating Point Operations Per Second] Frontend: IFetch_Line_Utilization [Rough Estimation of fraction of fetched lines bytes that were likely consumed by program instructions] Frontend_Bandwidth: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] Memory_BW: MLP [Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)] v2: Check return value of asprintf to fix warning on FC26 Fix key in lookup/addition for the groups list Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-8-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Add generic support for standalone metrics specified in JSON files to perf stat. A metric is a formula that uses multiple events to compute a higher level result (e.g. IPC). Previously metrics were always tied to an event and automatically enabled with that event. But now change it that we can have standalone metrics. They are in the same JSON data structure as events, but don't have an event name. We also allow to organize the metrics in metric groups, which allows a short cut to select several related metrics at once. Add a new -M / --metrics option to perf stat that adds the metrics or metric groups specified. Add the core code to manage and parse the metric groups. They are collected from the JSON data structures into a separate rblist. When computing shadow values look for metrics in that list. Then they are computed using the existing saved values infrastructure in stat-shadow.c The actual JSON metrics are in a separate pull request. % perf stat -M Summary --metric-only -a sleep 1 Performance counter stats for 'system wide': Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 317614222.0 1392930775.0 0.0 0.0 0.2 0.1 1.001497549 seconds time elapsed % perf stat -M GFLOPs flops Performance counter stats for 'flops': 3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%) 14 fp_comp_ops_exe.sse_scalar_double (66.65%) 0 fp_comp_ops_exe.sse_packed_double (66.67%) 0 fp_comp_ops_exe.sse_packed_single (66.70%) 0 simd_fp_256.packed_double (66.70%) 0 simd_fp_256.packed_single (66.67%) 0 duration_time 3.238372845 seconds time elapsed v2: Add missing header file v3: Move find_map to pmu.c Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Extract the code to get the per cpu JSON alias into a separate function for reuse. No behavior changes. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-6-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Print the generic metric header even when the expression evaluation failed. Otherwise an expression that fails on the first collections due to division by zero may suddenly reappear later without an header. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-5-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
The 'perf stat' shadow metric printing already supports generic metrics. Factor out the code doing that into a separate function that can be re-used in a later patch. No behavior changes. v2: Fix indentation Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-4-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Andi Kleen 提交于
Setting up groups can be complicated due to the complicated scheduling restrictions of different PMUs. User tools usually don't understand all these restrictions. Still in many cases it is useful to set up groups and they work most of the time. However if the group is set up wrong some members will not report any value because they never get scheduled. Add a concept of a 'weak group': try to set up a group, but if it's not schedulable fallback to not using a group. That gives us the best of both worlds: groups if they work, but still a usable fallback if they don't. In theory it would be possible to have more complex fallback strategies (e.g. try to split the group in half), but the simple fallback of not using a group seems to work for now. So far the weak group is only implemented for perf stat, not for record. Here's an unschedulable group (on IvyBridge with SMT on) % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 73,806,067 branches 4,848,144 branch-misses # 6.57% of all branches 14,754,458 l1d.replacement 24,905,558 l2_lines_in.all <not supported> l2_rqsts.all_code_rd <------- will never report anything With the weak group: % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1 125,366,055 branches (80.02%) 9,208,402 branch-misses # 7.35% of all branches (80.01%) 24,560,249 l1d.replacement (80.00%) 43,174,971 l2_lines_in.all (80.05%) 31,891,457 l2_rqsts.all_code_rd (79.92%) The extra event scheduled with some extra multiplexing v2: Move fallback code to separate function. Add comment on for_each_group_member Adjust to new perf_evsel__close interface v3: Fix debug print out. Committer testing: Before: # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 Performance counter stats for 'system wide': <not counted> branches <not counted> branch-misses <not counted> l1d.replacement <not counted> l2_lines_in.all <not supported> l2_rqsts.all_code_rd 1.002147212 seconds time elapsed # perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 Performance counter stats for 'system wide': 83,207,892 branches 11,065,444 l1d.replacement 28,484,024 l2_lines_in.all 12,186,179 l2_rqsts.all_code_rd 1.001739493 seconds time elapsed After: # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1 Performance counter stats for 'system wide': 543,323,909 branches (80.01%) 27,100,512 branch-misses # 4.99% of all branches (80.02%) 50,402,905 l1d.replacement (80.03%) 67,385,892 l2_lines_in.all (80.01%) 21,352,885 l2_rqsts.all_code_rd (79.94%) 1.001086658 seconds time elapsed # Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org [ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 12 9月, 2017 1 次提交
-
-
由 Jiri Olsa 提交于
Do not carry the perf.data file descriptor into the workload process and close it when perf executes the workload. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908084621.31595-2-jolsa@kernel.org [ Add definitions for O_CLOEXEC for older systems ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 02 9月, 2017 4 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
Peter reported that when he explicitely asked for multiple events with the same name on the command line it got coalesced into just one line, i.e.: # perf stat -e cycles -e cycles -e cycles usleep 1 Performance counter stats for 'usleep 1': 3,269,652 cycles 0.000884123 seconds time elapsed # And while there is the --no-merges option to disable that auto-merging, this is a blunt change in behaviour for such explicit request, so change the code so that this auto merging is done only when handling the multi PMU aliases with the same name that introduced this coalescing, restoring the previous behaviour for the explicit case: # perf stat -e cycles -e cycles -e cycles usleep 1 Performance counter stats for 'usleep 1': 1,472,837 cycles 1,472,837 cycles 1,472,837 cycles 0.001764870 seconds time elapsed # Reported-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 430daf2d ("perf stat: Collapse identically named events") Link: http://lkml.kernel.org/r/20170831184122.GK4831@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
Add a new sort option "phys_daddr" for --mem-mode sort. With this option applied, perf can sort and report by sample's physical address. Signed-off-by: NKan Liang <kan.liang@intel.com> Tested-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NStephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Kan Liang 提交于
Support new sample type PERF_SAMPLE_PHYS_ADDR for physical address. Add new option --phys-data to record sample physical address. Signed-off-by: NKan Liang <kan.liang@intel.com> Tested-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NStephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-2-git-send-email-kan.liang@intel.com [ Added missing printing in evsel.c patch sent by Jiri Olsa ] Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Arnaldo Carvalho de Melo 提交于
With two new methods, one to find the first match, returning its syscall id and its index in whatever internal database it keeps the syscall into, then one to find the next match, if any. Implemented only on arches where we actually read the syscall table from the kernel sources, i.e. x86-64 for now, all the others use the libaudit method for which this returns -1, i.e. just stubs were added, with the actual implementation using whatever libaudit functions for matching that may be available. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-i0sj4rxk1a63pfe9gl8z8irs@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 30 8月, 2017 1 次提交
-
-
由 Jin Yao 提交于
The branch history code has a loop detection function. With this, we can get the number of iterations by calculating the removed loops. While it would be nice for knowing the average cycles of iterations. This patch adds up the cycles in branch entries of removed loops and save the result to the next branch entry (e.g. branch entry A). Finally it will display the iteration number and average cycles at the "from" of branch entry A. For example: perf record -g -j any,save_type ./div perf report --branch-history --no-children --stdio --22.63%--main div.c:42 (RET CROSS_2M) compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2) | --10.73%--compute_flag div.c:27 (RET CROSS_2M) rand rand.c:28 (cycles:1) rand rand.c:28 (RET CROSS_2M) __random random.c:298 (cycles:1) __random random.c:297 (COND_BWD CROSS_2M) __random random.c:295 (cycles:1) __random random.c:295 (COND_BWD CROSS_2M) __random random.c:295 (cycles:1) __random random.c:295 (RET CROSS_2M) Signed-off-by: NYao Jin <yao.jin@linux.intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 8月, 2017 3 次提交
-
-
由 Li Bin 提交于
On x86, the plt header size is as same as the plt entry size, and can be identified from shdr's sh_entsize of the plt. But we can't assume that the sh_entsize of the plt shdr is always the plt entry size in all architecture, and the plt header size may be not as same as the plt entry size in some architecure. On ARM, the plt header size is 20 bytes and the plt entry size is 12 bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils implementation. The plt section is as follows: Disassembly of section .plt: 000004a0 <__cxa_finalize@plt-0x14>: 4a0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 4a4: e59fe004 ldr lr, [pc, #4] ; 4b0 <_init+0x1c> 4a8: e08fe00e add lr, pc, lr 4ac: e5bef008 ldr pc, [lr, #8]! 4b0: 00008424 .word 0x00008424 000004b4 <__cxa_finalize@plt>: 4b4: e28fc600 add ip, pc, #0, 12 4b8: e28cca08 add ip, ip, #8, 20 ; 0x8000 4bc: e5bcf424 ldr pc, [ip, #1060]! ; 0x424 000004c0 <printf@plt>: 4c0: e28fc600 add ip, pc, #0, 12 4c4: e28cca08 add ip, ip, #8, 20 ; 0x8000 4c8: e5bcf41c ldr pc, [ip, #1052]! ; 0x41c On AARCH64, the plt header size is 32 bytes and the plt entry size is 16 bytes. The plt section is as follows: Disassembly of section .plt: 0000000000000560 <__cxa_finalize@plt-0x20>: 560: a9bf7bf0 stp x16, x30, [sp,#-16]! 564: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 568: f944be11 ldr x17, [x16,#2424] 56c: 9125e210 add x16, x16, #0x978 570: d61f0220 br x17 574: d503201f nop 578: d503201f nop 57c: d503201f nop 0000000000000580 <__cxa_finalize@plt>: 580: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 584: f944c211 ldr x17, [x16,#2432] 588: 91260210 add x16, x16, #0x980 58c: d61f0220 br x17 0000000000000590 <__gmon_start__@plt>: 590: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 594: f944c611 ldr x17, [x16,#2440] 598: 91262210 add x16, x16, #0x988 59c: d61f0220 br x17 NOTES: In addition to ARM and AARCH64, other architectures, such as s390/alpha/mips/parisc/poperpc/sh/sparc/xtensa also need to consider this issue. Signed-off-by: NLi Bin <huawei.libin@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: David Tolnay <dtolnay@gmail.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/1496622849-21877-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Li Bin 提交于
The commit 9aaf5a5f ("perf probe: Check kprobes blacklist when adding new events"), 'perf probe' supports checking the blacklist of the fuctions which can not be probed. But the checking condition is wrong, that the end_addr of the symbol which is the start_addr of the next symbol can't be included. Committer notes: IOW make it match its kernel counterpart in kernel/kprobes.c: bool within_kprobe_blacklist(unsigned long addr) Each entry have as its end address not its end address, but the first address _outside_ that symbol, which for related functions, is the first address of the next symbol, like these from kernel/trace/trace_probe.c: 0xffffffffbd198df0-0xffffffffbd198e40 print_type_u8 0xffffffffbd198e40-0xffffffffbd198e90 print_type_u16 0xffffffffbd198e90-0xffffffffbd198ee0 print_type_u32 0xffffffffbd198ee0-0xffffffffbd198f30 print_type_u64 0xffffffffbd198f30-0xffffffffbd198f80 print_type_s8 0xffffffffbd198f80-0xffffffffbd198fd0 print_type_s16 0xffffffffbd198fd0-0xffffffffbd199020 print_type_s32 0xffffffffbd199020-0xffffffffbd199070 print_type_s64 0xffffffffbd199070-0xffffffffbd1990c0 print_type_x8 0xffffffffbd1990c0-0xffffffffbd199110 print_type_x16 0xffffffffbd199110-0xffffffffbd199160 print_type_x32 0xffffffffbd199160-0xffffffffbd1991b0 print_type_x64 But not always: 0xffffffffbd1997b0-0xffffffffbd1997c0 fetch_kernel_stack_address (kernel/trace/trace_probe.c) 0xffffffffbd1c57f0-0xffffffffbd1c58b0 __context_tracking_enter (kernel/context_tracking.c) Signed-off-by: NLi Bin <huawei.libin@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Fixes: 9aaf5a5f ("perf probe: Check kprobes blacklist when adding new events") Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 David Carrillo-Cisneros 提交于
Prior to this patch, make scripts tested for CLANG with ifeq ($(CC), clang), failing to detect CLANG binaries with different names. Fix it by testing for the existence of __clang__ macro in the list of compiler defined macros. Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-5-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-