- 07 2月, 2012 1 次提交
-
-
由 Stephane Eranian 提交于
The following patch fixes a bug introduced by the following commit: e050e3f0 ("perf: Fix broken interrupt rate throttling") The patch caused the following warning to pop up depending on the sampling frequency adjustments: ------------[ cut here ]------------ WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4() It was caused by the following call sequence: perf_adjust_freq_unthr_context.part() { stop() if (delta > 0) { perf_adjust_period() { if (period > 8*...) { stop() ... start() } } } start() } Which caused a double start and a double stop, thus triggering the assert in x86_pmu_start(). The patch fixes the problem by avoiding the double calls. We pass a new argument to perf_adjust_period() to indicate whether or not the event is already stopped. We can't just remove the start/stop from that function because it's called from __perf_event_overflow where the event needs to be reloaded via a stop/start back-toback call. The patch reintroduces the assertion in x86_pmu_start() which was removed by commit: 84f2b9b2 ("perf: Remove deprecated WARN_ON_ONCE()") In this second version, we've added calls to disable/enable PMU during unthrottling or frequency adjustment based on bug report of spurious NMI interrupts from Eric Dumazet. Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NStephane Eranian <eranian@google.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: markus@trippelsdorf.de Cc: paulus@samba.org Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad [ Minor edits to the changelog and to the code ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 27 1月, 2012 1 次提交
-
-
由 Stephane Eranian 提交于
This patch fixes the sampling interrupt throttling mechanism. It was broken in v3.2. Events were not being unthrottled. The unthrottling mechanism required that events be checked at each timer tick. This patch solves this problem and also separates: - unthrottling - multiplexing - frequency-mode period adjustments Not all of them need to be executed at each timer tick. This third version of the patch is based on my original patch + PeterZ proposal (https://lkml.org/lkml/2012/1/7/87). At each timer tick, for each context: - if the current CPU has throttled events, we unthrottle events - if context has frequency-based events, we adjust sampling periods - if we have reached the jiffies interval, we multiplex (rotate) We decoupled rotation (multiplexing) from frequency-mode sampling period adjustments. They should not necessarily happen at the same rate. Multiplexing is subject to jiffies_interval (currently at 1 but could be higher once the tunable is exposed via sysfs). We have grouped frequency-mode adjustment and unthrottling into the same routine to minimize code duplication. When throttled while in frequency mode, we scan the events only once. We have fixed the threshold enforcement code in __perf_event_overflow(). There was a bug whereby it would allow more than the authorized rate because an increment of hwc->interrupts was not executed at the right place. The patch was tested with low sampling limit (2000) and fixed periods, frequency mode, overcommitted PMU. On a 2.1GHz AMD CPU: $ cat /proc/sys/kernel/perf_event_max_sample_rate 2000 We set a rate of 3000 samples/sec (2.1GHz/3000 = 700000): $ perf record -e cycles,cycles -c 700000 noploop 10 $ perf report -D | tail -21 Aggregated stats: TOTAL events: 80086 MMAP events: 88 COMM events: 2 EXIT events: 4 THROTTLE events: 19996 UNTHROTTLE events: 19996 SAMPLE events: 40000 cycles stats: TOTAL events: 40006 MMAP events: 5 COMM events: 1 EXIT events: 4 THROTTLE events: 9998 UNTHROTTLE events: 9998 SAMPLE events: 20000 cycles stats: TOTAL events: 39996 THROTTLE events: 9998 UNTHROTTLE events: 9998 SAMPLE events: 20000 For 10s, the cap is 2x2000x10 = 40000 samples. We get exactly that: 20000 samples/event. Signed-off-by: NStephane Eranian <eranian@google.com> Cc: <stable@kernel.org> # v3.2+ Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120126160319.GA5655@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 21 1月, 2012 1 次提交
-
-
由 Namhyung Kim 提交于
The perf_event_time() will call perf_cgroup_event_time() if @event is a cgroup event. Just do it directly and avoid the extra check.. Signed-off-by: NNamhyung Kim <namhyung.kim@lge.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1327021966-27688-2-git-send-email-namhyung.kim@lge.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 02 1月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 14 12月, 2011 1 次提交
-
-
由 Will Deacon 提交于
Commit 10c6db11 ("perf: Fix loss of notification with multi-event") seems to unconditionally dereference event->rb in the wakeup handler, this is wrong, there might not be a buffer attached. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111213152651.GP20297@mudshark.cambridge.arm.com [ minor edits ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 12月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
Now that subsys->can_attach() and attach() take @tset instead of @task, they can handle per-task operations. Convert ->can_attach_task() and ->attach_task() users to use ->can_attach() and attach() instead. Most converions are straight-forward. Noteworthy changes are, * In cgroup_freezer, remove unnecessary NULL assignments to unused methods. It's useless and very prone to get out of sync, which already happened. * In cpuset, PF_THREAD_BOUND test is checked for each task. This doesn't make any practical difference but is conceptually cleaner. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: James Morris <jmorris@namei.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org>
-
- 12 12月, 2011 1 次提交
-
-
由 Paul E. McKenney 提交于
Change from direct comparison of ->pid with zero to is_idle_task(). Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 07 12月, 2011 1 次提交
-
-
由 Gleb Natapov 提交于
perf_event_sched_in() shouldn't try to schedule task events if there are none otherwise task's ctx->is_active will be set and will not be cleared during sched_out. This will prevent newly added events from being scheduled into the task context. Fixes a boo-boo in commit 1d5f003f ("perf: Do not set task_ctx pointer in cpuctx if there are no events in the context"). Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111122140821.GF2557@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 12月, 2011 4 次提交
-
-
由 Gleb Natapov 提交于
jump_lable patching is very expensive operation that involves pausing all cpus. The patching of perf_sched_events jump_label is easily controllable from userspace by unprivileged user. When te user runs a loop like this: "while true; do perf stat -e cycles true; done" ... the performance of my test application that just increments a counter for one second drops by 4%. This is on a 16 cpu box with my test application using only one of them. An impact on a real server doing real work will be worse. Performance of KVM PMU drops nearly 50% due to jump_lable for "perf record" since KVM PMU implementation creates and destroys perf event frequently. This patch introduces a way to rate limit jump_label patching and uses it to fix the above problem. I believe that as jump_label use will spread the problem will become more common and thus solving it in a generic code is appropriate. Also fixing it in the perf code would result in moving jump_label accounting logic to perf code with all the ifdefs in case of JUMP_LABEL=n kernel. With this patch all details are nicely hidden inside jump_label code. Signed-off-by: NGleb Natapov <gleb@redhat.com> Acked-by: NJason Baron <jbaron@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111127155909.GO2557@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Deng-Cheng Zhu reported that sibling events that were created disabled with enable_on_exec would never get enabled. Iterate all events instead of the group lists. Reported-by: NDeng-Cheng Zhu <dczhu@mips.com> Tested-by: NDeng-Cheng Zhu <dczhu@mips.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322048382.14799.41.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-yv4o74vh90suyghccgykbnry@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Gleb writes: > Currently pmu is disabled and re-enabled on each timer interrupt even > when no rotation or frequency adjustment is needed. On Intel CPU this > results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal > it does not cause significant slowdown, but when running perf in a virtual > machine it leads to 20% slowdown on my machine. Cure this by keeping a perf_event_context::nr_freq counter that counts the number of active events that require frequency adjustments and use this in a similar fashion to the already existing nr_events != nr_active test in perf_rotate_context(). By being able to exclude both rotation and frequency adjustments a-priory for the common case we can avoid the otherwise superfluous PMU disable. Suggested-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 12月, 2011 1 次提交
-
-
由 Peter Zijlstra 提交于
When you do: $ perf record -e cycles,cycles,cycles noploop 10 You expect about 10,000 samples for each event, i.e., 10s at 1000samples/sec. However, this is not what's happening. You get much fewer samples, maybe 3700 samples/event: $ perf report -D | tail -15 Aggregated stats: TOTAL events: 10998 MMAP events: 66 COMM events: 2 SAMPLE events: 10930 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 cycles stats: TOTAL events: 3642 SAMPLE events: 3642 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 On a Intel Nehalem or even AMD64, there are 4 counters capable of measuring cycles, so there is plenty of space to measure those events without multiplexing (even with the NMI watchdog active). And even with multiplexing, we'd expect roughly the same number of samples per event. The root of the problem was that when the event that caused the buffer to become full was not the first event passed on the cmdline, the user notification would get lost. The notification was sent to the file descriptor of the overflowed event but the perf tool was not polling on it. The perf tool aggregates all samples into a single buffer, i.e., the buffer of the first event. Consequently, it assumes notifications for any event will come via that descriptor. The seemingly straight forward solution of moving the waitq into the ringbuffer object doesn't work because of life-time issues. One could perf_event_set_output() on a fd that you're also blocking on and cause the old rb object to be freed while its waitq would still be referenced by the blocked thread -> FAIL. Therefore link all events to the ringbuffer and broadcast the wakeup from the ringbuffer object to all possible events that could be waited upon. This is rather ugly, and we're open to better solutions but it works for now. Reported-by: NStephane Eranian <eranian@google.com> Finished-by: NStephane Eranian <eranian@google.com> Reviewed-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 14 11月, 2011 3 次提交
-
-
由 Andrew Vagin 提交于
This patch solves the following problem: Now some samples may be lost due to throttling. The number of samples is restricted by sysctl_perf_event_sample_rate/HZ. A trace event is divided on some samples according to event's period. I don't sure, that we should generate more than one sample on each trace event. I think the better way to use SAMPLE_PERIOD. E.g.: I want to trace when a process sleeps. I created a process, which sleeps for 1ms and for 4ms. perf got 100 events in both cases. swapper 0 [000] 1141.371830: sched_stat_sleep: comm=foo pid=1801 delay=1386750 [ns] swapper 0 [000] 1141.369444: sched_stat_sleep: comm=foo pid=1801 delay=4499585 [ns] In the first case a kernel want to send 4499585 events and in the second case it wants to send 1386750 events. perf-reports shows that process sleeps in both places equal time. It's bug. With this patch kernel generates one event on each "sleep" and the time slice is saved in the field "period". Perf knows how handle it. Signed-off-by: NAndrew Vagin <avagin@openvz.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320670457-2633428-3-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Borislav Petkov 提交于
Split the callchain code from the perf events core into a new kernel/events/callchain.c file. This simplifies a bit the big core.c Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> [keep ctx recursion handling inline and use internal headers] Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1318778104-17152-1-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gleb Natapov 提交于
Do not set task_ctx pointer during sched_in if there are no events associated with the context. Otherwise if during task execution total number of events in the system will become zero perf_event_context_sched_out() will not be called and cpuctx->task_ctx will be left with a stale value. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111023171033.GI17571@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 11月, 2011 1 次提交
-
-
由 Robert Richter 提交于
The legacy x86 nmi watchdog code was removed with the implementation of the perf based nmi watchdog. This broke Oprofile's nmi timer mode. To run nmi timer mode we relied on a continuous ticking nmi source which the nmi watchdog provided. The nmi tick was no longer available and current watchdog can not be used anymore since it runs with very long periods in the range of seconds. This patch reimplements the nmi timer mode using a perf counter nmi source. V2: * removing pr_info() * fix undefined reference to `__udivdi3' for 32 bit build * fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb * removed nmi timer setup in arch/x86 * implemented function stubs for op_nmi_init/exit() * made code more readable in oprofile_init() V3: * fix architectural initialization in oprofile_init() * fix CONFIG_OPROFILE_NMI_TIMER dependencies Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 03 11月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 144060fe. It causes a resume regression for Andi on his Acer Aspire 1830T post 3.1. The screen just stays black after wakeup. Also, it really looks like the wrong way to suspend and resume perf events: I think they should be done as part of the CPU suspend and resume, rather than as a notifier that does smp_call_function(). Reported-by: NAndi Kleen <andi@firstfloor.org> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 11月, 2011 2 次提交
-
-
由 Christoph Lameter 提交于
Some kernel components pin user space memory (infiniband and perf) (by increasing the page count) and account that memory as "mlocked". The difference between mlocking and pinning is: A. mlocked pages are marked with PG_mlocked and are exempt from swapping. Page migration may move them around though. They are kept on a special LRU list. B. Pinned pages cannot be moved because something needs to directly access physical memory. They may not be on any LRU list. I recently saw an mlockalled process where mm->locked_vm became bigger than the virtual size of the process (!) because some memory was accounted for twice: Once when the page was mlocked and once when the Infiniband layer increased the refcount because it needt to pin the RDMA memory. This patch introduces a separate counter for pinned pages and accounts them seperately. Signed-off-by: NChristoph Lameter <cl@linux.com> Cc: Mike Marciniszyn <infinipath@qlogic.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Gortmaker 提交于
These files were getting <linux/module.h> via an implicit non-obvious path, but we want to crush those out of existence since they cost time during compiles of processing thousands of lines of headers for no reason. Give them the lightweight header that just contains the EXPORT_SYMBOL infrastructure. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 31 8月, 2011 2 次提交
-
-
由 Eric B Munson 提交于
We detected a serious issue with PERF_SAMPLE_READ and timing information when events were being multiplexing. Samples would have time_running > time_enabled. That was easy to reproduce with a libpfm4 example (ran 3 times to cause multiplexing on Core 2): $ syst_smpl -e uops_retired:freq=1 & $ syst_smpl -e uops_retired:freq=1 & $ syst_smpl -e uops_retired:freq=1 & IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184 syst_smpl: WARNING: time_running > time_enabled 63277537998 uops_retired:freq=1 , scaled The bug was not present in kernel up to (and including) 3.0. It turns out the bug was introduced by the following commit: commit c4794295 events: Move lockless timer calculation into helper function The parameters of the function got reversed yet the call sites were not updated to reflect the change. That lead to time_running and time_enabled being swapped. That had no effect when there was no multiplexing because in that case time_running = time_enabled but it would show up in any other scenario. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110829124112.GA4828@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mark Rutland 提交于
Currently, an event's 'pmu' field is set after pmu::event_init() is called. This means that pmu::event_init() must figure out which struct pmu the event was initialised from. This makes it difficult to consolidate common event initialisation code for similar PMUs, and very difficult to implement drivers for PMUs which can have multiple instances (e.g. a USB controller PMU, a GPU PMU, etc). This patch sets the 'pmu' field before initialising the event, allowing event init code to identify the struct pmu instance easily. In the event of failure to initialise an event, the event is destroyed via kfree() without calling perf_event::destroy(), so this shouldn't result in bad behaviour even if the destroy field was set before failure to initialise was noted. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1313062280-19123-1-git-send-email-mark.rutland@arm.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 29 8月, 2011 1 次提交
-
-
由 Stephane Eranian 提交于
The current cgroup context switch code was incorrect leading to bogus counts. Furthermore, as soon as there was an active cgroup event on a CPU, the context switch cost on that CPU would increase by a significant amount as demonstrated by a simple ping/pong example: $ ./pong Both processes pinned to CPU1, running for 10s 10684.51 ctxsw/s Now start a cgroup perf stat: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 $ ./pong Both processes pinned to CPU1, running for 10s 6674.61 ctxsw/s That's a 37% penalty. Note that pong is not even in the monitored cgroup. The results shown by perf stat are bogus: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 Performance counter stats for 'sleep 100': CPU1 <not counted> cycles test CPU1 16,984,189,138 cycles # 0.000 GHz The second 'cycles' event should report a count @ CPU clock (here 2.4GHz) as it is counting across all cgroups. The patch below fixes the bogus accounting and bypasses any cgroup switches in case the outgoing and incoming tasks are in the same cgroup. With this patch the same test now yields: $ ./pong Both processes pinned to CPU1, running for 10s 10775.30 ctxsw/s Start perf stat with cgroup: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Run pong outside the cgroup: $ /pong Both processes pinned to CPU1, running for 10s 10687.80 ctxsw/s The penalty is now less than 2%. And the results for perf stat are correct: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 <not counted> cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz Now perf stat reports the correct counts for for the non cgroup event. If we run pong inside the cgroup, then we also get the correct counts: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 22,297,726,205 cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz 10.001457237 seconds time elapsed Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 14 8月, 2011 2 次提交
-
-
由 Mark Rutland 提交于
Currently, an event's 'pmu' field is set after pmu::event_init() is called. This means that pmu::event_init() must figure out which struct pmu the event was initialised from. This makes it difficult to consolidate common event initialisation code for similar PMUs, and very difficult to implement drivers for PMUs which can have multiple instances (e.g. a USB controller PMU, a GPU PMU, etc). This patch sets the 'pmu' field before initialising the event, allowing event init code to identify the struct pmu instance easily. In the event of failure to initialise an event, the event is destroyed via kfree() without calling perf_event::destroy(), so this shouldn't result in bad behaviour even if the destroy field was set before failure to initialise was noted. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1313062280-19123-1-git-send-email-mark.rutland@arm.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Francis reports that s2r gets him spurious NMIs, this is because the suspend code leaves the boot cpu up and running. Cure this by adding a suspend notifier. The problem is that hotplug and suspend are completely un-serialized and the PM notifiers run before the suspend cpu unplug of all but the boot cpu. This leaves a window where the user can initialize another hotplug operation (either remove or add a cpu) resulting in either one too many or one too few hotplug ops. Thus we cannot use the hotplug code for the suspend case. There's another reason to not use the hotplug code, which is that the hotplug code totally destroys the perf state, we can do better for suspend and simply remove all counters from the PMU so that we can re-instate them on resume. Reported-by: NFrancis Moreau <francis.moro@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-1cvevybkgmv4s6v5y37t4847@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 22 7月, 2011 1 次提交
-
-
由 Lin Ming 提交于
PMU type id can be allocated dynamically, so perf_event_attr::type check when copying attribute from userspace to kernel is not valid. Signed-off-by: NLin Ming <ming.m.lin@intel.com> Cc: Robert Richter <robert.richter@amd.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309421396-17438-4-git-send-email-ming.m.lin@intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 01 7月, 2011 8 次提交
-
-
由 Avi Kivity 提交于
KVM needs one-shot samples, since a PMC programmed to -X will fire after X events and then again after 2^40 events (i.e. variable period). Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Avi Kivity 提交于
The perf_event overflow handler does not receive any caller-derived argument, so many callers need to resort to looking up the perf_event in their local data structure. This is ugly and doesn't scale if a single callback services many perf_events. Fix by adding a context parameter to perf_event_create_kernel_counter() (and derived hardware breakpoints APIs) and storing it in the perf_event. The field can be accessed from the callback as event->overflow_handler_context. All callers are updated. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Since only samples call perf_output_sample() its much saner (and more correct) to put the sample logic in there than in the perf_output_begin()/perf_output_end() pair. Saves a useless argument, reduces conditionals and shrinks struct perf_output_handle, win! Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Eric B Munson 提交于
The event tracing infrastructure exposes two timers which should be updated each time the value of the counter is updated. Currently, these counters are only updated when userspace calls read() on the fd associated with an event. This means that counters which are read via the mmap'd page exclusively never have their timers updated. This patch adds ensures that the timers are updated each time the values in the mmap'd page are updated. Signed-off-by: NEric B Munson <emunson@mgebm.net> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1308932786-5111-1-git-send-email-emunson@mgebm.netSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Eric B Munson 提交于
Take the timer calculation from perf_output_read and move it to a helper function for any place that needs timer values but cannot take the ctx->lock. Signed-off-by: NEric B Munson <emunson@mgebm.net> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1308861279-15216-2-git-send-email-emunson@mgebm.netSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Eric B Munson 提交于
Signed-off-by: NEric B Munson <emunson@mgebm.net> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1308861279-15216-1-git-send-email-emunson@mgebm.netSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Vince Weaver 提交于
Since 2.6.36 (specifically commit d57e34fd ("perf: Simplify the ring-buffer logic: make perf_buffer_alloc() do everything needed"), the perf_buffer_init_code() has been mis-setting the buffer watermark if perf_event_attr.wakeup_events has a non-zero value. This is because perf_event_attr.wakeup_events is a union with perf_event_attr.wakeup_watermark. This commit re-enables the check for perf_event_attr.watermark being set before continuing with setting a non-default watermark. This bug is most noticable when you are trying to use PERF_IOC_REFRESH with a value larger than one and perf_event_attr.wakeup_events is set to one. In this case the buffer watermark will be set to 1 and you will get extraneous POLL_IN overflows rather than POLL_HUP as expected. [ avoid using attr.wakeup_events when attr.watermark is set ] Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.eduSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 6月, 2011 1 次提交
-
-
由 Frederic Weisbecker 提交于
And create the internal perf events header. v2: Keep an internal inlined perf_output_copy() Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@alien8.de> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1305827704-5607-1-git-send-email-fweisbec@gmail.com [ v3: use clearer 'ring_buffer' and 'rb' naming ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 6月, 2011 1 次提交
-
-
由 Peter Zijlstra 提交于
A lost Quilt refresh of 2c29ef0f (perf: Simplify and fix __perf_install_in_context()) is causing grief and lockups, reported by Jiri Olsa. When installing an event in a task context, there's a number of issues: - there might not be an existing task context, in which case we should install the now current context; - there might already be a context, not the current one, in which case we should de-schedule the old and install the new; these cases were dealt with in the lost refresh, however there is one further case that was found in testing: - there might already be a context, the current one, in which case we should still de-schedule, and should take care to re-install it (note that task_ctx_sched_out() clears cpuctx->task_ctx). Reported-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1307399008.2497.971.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 31 5月, 2011 1 次提交
-
-
由 Peter Zijlstra 提交于
Ben changed the cgroup API in commit f780bdb7 (cgroups: add per-thread subsystem callbacks) in an incompatible way, but forgot to convert the perf cgroup bits. Avoid compile warnings and runtime splats and convert perf too ;-) Acked-by: NBen Blum <bblum@andrew.cmu.edu> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1306767651.1200.2990.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 29 5月, 2011 3 次提交
-
-
由 Peter Zijlstra 提交于
Since perf_install_in_context() will now install a context when we add the first event, we can de-schedule the context when the last event is removed. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110409192142.090431763@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
In order to always call list_del_event() on the correct cpu if the event is part of an active context and avoid having to do two IPIs, change the close() semantics slightly. The current perf_event_disable() call would disable a whole group if the event that's being closed is the group leader, whereas the new code keeps the group siblings enabled. People should not rely on this behaviour and I don't think they do, but in case we find they do, the fix is easy and we have to take the double IPI cost. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vince Weaver <vweaver1@eecs.utk.edu> Link: http://lkml.kernel.org/r/20110409192142.038377551@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
This was scattered out - refactor it into a single function. No change in functionality. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110409192141.979862055@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>
-