- 10 9月, 2010 19 次提交
-
-
由 Peter Zijlstra 提交于
Since software events are always schedulable, mixing them up with hardware events (who are not) can lead to funny scheduling oddities. Giving them their own context solves this. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Provide the infrastructure for multiple task contexts. A more flexible approach would have resulted in more pointer chases in the scheduling hot-paths. This approach has the limitation of a static number of task contexts. Since I expect most external PMUs to be system wide, or at least node wide (as per the intel uncore unit) they won't actually need a task context. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Unify the two perf_event_context allocation sites. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Move all inherit code near each other. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Allocate per-cpu contexts per pmu. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Give each cpu-context its own timer so that it is a self contained entity, this eases the way for per-pmu-per-cpu contexts as well as provides the basic infrastructure to allow different rotation times per pmu. Things to look at: - folding the tick and these TICK_NSEC timers - separate task context rotation Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Separate the swevent hash-table from the cpu_context bits in preparation for per pmu cpu contexts. This keeps the swevent hash a global entity. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Separate find_get_context() from the event allocation and initialization so that we may make find_get_context() depend on the event pmu in a later patch. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Neither the overcommit nor the reservation sysfs parameter were actually working, remove them as they'll only get in the way. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Replace pmu::{enable,disable,start,stop,unthrottle} with pmu::{add,del,start,stop}, all of which take a flags argument. The new interface extends the capability to stop a counter while keeping it scheduled on the PMU. We replace the throttled state with the generic stopped state. This also allows us to efficiently stop/start counters over certain code paths (like IRQ handlers). It also allows scheduling a counter without it starting, allowing for a generic frozen state (useful for rotating stopped counters). The stopped state is implemented in two different ways, depending on how the architecture implemented the throttled state: 1) We disable the counter: a) the pmu has per-counter enable bits, we flip that b) we program a NOP event, preserving the counter state 2) We store the counter state and ignore all read/overflow events Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Use hw_perf_event::period_left instead of hw_perf_event::remaining and win back 8 bytes. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Provide default implementations for the pmu txn methods, this allows us to remove some conditional code. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Changes perf_disable() into perf_pmu_disable(). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Since the current perf_disable() usage is only an optimization, remove it for now. This eases the removal of the __weak hw_perf_enable() interface. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Fixup random annoying style bits. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Simple registration interface for struct pmu, this provides the infrastructure for removing all the weak functions. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"` Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Since we have UP_PREPARE, we should also have UP_CANCELED. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Li Zefan 提交于
Commit 1c024eca (perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events) caused a module refcount leak. Reported-And-Tested-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4C7E1F12.8030304@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 08 9月, 2010 3 次提交
-
-
由 Masami Hiramatsu 提交于
Check the argument name whether it is invalid (not C-like symbol name). This makes event format simple. Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113912.22882.62313.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Masami Hiramatsu 提交于
Set "argN" name for each argument automatically if it has no specified name. Since dynamic trace event(kprobe_events) accepts special characters for its argument, its format can show those special characters (e.g. '$', '%', '+'). However, perf can't parse those format because of the character (especially '%') mess up the format. This sets "argX" name for those arguments if user omitted the argument names. E.g. # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events # cat tracing/kprobe_events p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113906.22882.59312.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Masami Hiramatsu 提交于
Fix a memory leak which happens when a field name conflicts with others. In error case, free_trace_probe() will free all arguments until nr_args, so this increments nr_args the begining of the loop instead of the end. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113846.22882.12670.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 9月, 2010 1 次提交
-
-
由 Steven Rostedt 提交于
Reading the file set_ftrace_filter does three things. 1) shows whether or not filters are set for the function tracer 2) shows what functions are set for the function tracer 3) shows what triggers are set on any functions 3 is independent from 1 and 2. The way this file currently works is that it is a state machine, and as you read it, it may change state. But this assumption breaks when you use lseek() on the file. The state machine gets out of sync and the t_show() may use the wrong pointer and cause a kernel oops. Luckily, this will only kill the app that does the lseek, but the app dies while holding a mutex. This prevents anyone else from using the set_ftrace_filter file (or any other function tracing file for that matter). A real fix for this is to rewrite the code, but that is too much for a -rc release or stable. This patch simply disables llseek on the set_ftrace_filter() file for now, and we can do the proper fix for the next major release. Reported-by: NRobert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Cc: <stable@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 02 9月, 2010 1 次提交
-
-
由 Steven Rostedt 提交于
While discussing the strictness of the 80 character limit on the Kernel Summit Discussion mailing list, I showed examples that I broke that limit slightly with some algorithms. In discussing with John Linville, what looked better, I realized that two of the 80 char breaking culprits were an identical expression. As a clean up, this patch moves the identical expression into its own helper function and that is used instead. As a side effect, the offending code is now under the 80 character limit. :-) This clean up code also changes the expression from (A - B) - C to A - (B + C) This makes the code look a little nicer too. Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 01 9月, 2010 4 次提交
-
-
由 Don Zickus 提交于
During my rewrite, the semantics of touch_nmi_watchdog and touch_softlockup_watchdog changed enough to break some drivers (mostly over preemptable regions). These are cases where long delays on one CPU (due to print_delay for example) can cause long delays on other CPUs - so we must 'touch' the nmi_watchdog flag of those other CPUs as well. This change brings those touch_*_watchdog() functions back in line with to how they used to work. Signed-off-by: NDon Zickus <dzickus@redhat.com> Acked-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: peterz@infradead.org Cc: fweisbec@gmail.com LKML-Reference: <1283310009-22168-2-git-send-email-dzickus@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Akinobu Mita 提交于
The panic notifer in lockup_detector just set did_panic to 1. But did_panic is not used anywhere so we can just remove it. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com LKML-Reference: <1283310009-22168-4-git-send-email-dzickus@redhat.com> Signed-off-by: NDon Zickus <dzickus@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Akinobu Mita 提交于
By the commit e6bde73b ("cpu-hotplug: return better errno on cpu hotplug failure"), the cpu notifier can return encapsulate errno value, resulting in more meaningful error codes for CPU hotplug failures. This converts the cpu notifier to return encapsulate errno value for the lockup_detector as well. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com LKML-Reference: <1283310009-22168-3-git-send-email-dzickus@redhat.com> Signed-off-by: NDon Zickus <dzickus@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Li Zefan 提交于
While we are reading trace_stat/functionX and someone just disabled function_profile at that time, we can trigger this: divide error: 0000 [#1] PREEMPT SMP ... EIP is at function_stat_show+0x90/0x230 ... This fix just takes the ftrace_profile_lock and checks if rec->counter is 0. If it's 0, we know the profile buffer has been reset. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: stable@kernel.org LKML-Reference: <4C723644.4040708@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 30 8月, 2010 1 次提交
-
-
由 Stephane Eranian 提交于
Per-thread events with a cpu filter, i.e., cpu != -1, were not reporting correct timings when the thread never ran on the monitored cpu. The time enabled was reported as a negative value. This patch fixes the problem by updating tstamp_stopped, tstamp_running in event_sched_out() for events with filters and which are marked as INACTIVE. The function group_sched_out() is modified to systematically call into event_sched_out() to avoid duplicating the timing adjustment code twice. With the patch, I now get: $ task_cpu -i -e unhalted_core_cycles,unhalted_core_cycles noploop 2 noploop for 2 seconds CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594) CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594) Signed-off-by: NStephane Eranian <eranian@gmail.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@google.com LKML-Reference: <4c76802d.aae9d80a.115d.70fe@mx.google.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 25 8月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
save_stack_trace() stores the instruction pointer, not the function descriptor. On ppc64 the trace stack code currently dereferences the instruction pointer and shows 8 bytes of instructions in our backtraces: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 0x6000000048000004 1) 5312 160 0x60000000ebad01b0 2) 5152 160 0x2c23000041c20030 3) 4992 240 0x600000007c781b79 4) 4752 160 0xe84100284800000c 5) 4592 192 0x600000002fa30000 6) 4400 256 0x7f1800347b7407e0 7) 4144 208 0xe89f0108f87f0070 8) 3936 272 0xe84100282fa30000 Since we aren't dealing with function descriptors, use %pS instead of %pF to fix it: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 ftrace_call+0x4/0x8 1) 5312 160 .current_io_context+0x28/0x74 2) 5152 160 .get_io_context+0x48/0xa0 3) 4992 240 .cfq_set_request+0x94/0x4c4 4) 4752 160 .elv_set_request+0x60/0x84 5) 4592 192 .get_request+0x2d4/0x468 6) 4400 256 .get_request_wait+0x7c/0x258 7) 4144 208 .__make_request+0x49c/0x610 8) 3936 272 .generic_make_request+0x390/0x434 Signed-off-by: NAnton Blanchard <anton@samba.org> Cc: rostedt@goodmis.org Cc: fweisbec@gmail.com LKML-Reference: <20100825013238.GE28360@kryten> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 23 8月, 2010 2 次提交
-
-
由 Tim Chen 提交于
There is a scalability issue for current implementation of optimistic mutex spin in the kernel. It is found on a 8 node 64 core Nehalem-EX system (HT mode). The intention of the optimistic mutex spin is to busy wait and spin on a mutex if the owner of the mutex is running, in the hope that the mutex will be released soon and be acquired, without the thread trying to acquire mutex going to sleep. However, when we have a large number of threads, contending for the mutex, we could have the mutex grabbed by other thread, and then another ……, and we will keep spinning, wasting cpu cycles and adding to the contention. One possible fix is to quit spinning and put the current thread on wait-list if mutex lock switch to a new owner while we spin, indicating heavy contention (see the patch included). I did some testing on a 8 socket Nehalem-EX system with a total of 64 cores. Using Ingo's test-mutex program that creates/delete files with 256 threads (http://lkml.org/lkml/2006/1/8/50) , I see the following speed up after putting in the mutex spin fix: ./mutex-test V 256 10 Ops/sec 2.6.34 62864 With fix 197200 Repeating the test with Aim7 fserver workload, again there is a speed up with the fix: Jobs/min 2.6.34 91657 With fix 149325 To look at the impact on the distribution of mutex acquisition time, I collected the mutex acquisition time on Aim7 fserver workload with some instrumentation. The average acquisition time is reduced by 48% and number of contentions reduced by 32%. #contentions Time to acquire mutex (cycles) 2.6.34 72973 44765791 With fix 49210 23067129 The histogram of mutex acquisition time is listed below. The acquisition time is in 2^bin cycles. We see that without the fix, the acquisition time is mostly around 2^26 cycles. With the fix, we the distribution get spread out a lot more towards the lower cycles, starting from 2^13. However, there is an increase of the tail distribution with the fix at 2^28 and 2^29 cycles. It seems a small price to pay for the reduced average acquisition time and also getting the cpu to do useful work. Mutex acquisition time distribution (acq time = 2^bin cycles): 2.6.34 With Fix bin #occurrence % #occurrence % 11 2 0.00% 120 0.24% 12 10 0.01% 790 1.61% 13 14 0.02% 2058 4.18% 14 86 0.12% 3378 6.86% 15 393 0.54% 4831 9.82% 16 710 0.97% 4893 9.94% 17 815 1.12% 4667 9.48% 18 790 1.08% 5147 10.46% 19 580 0.80% 6250 12.70% 20 429 0.59% 6870 13.96% 21 311 0.43% 1809 3.68% 22 255 0.35% 2305 4.68% 23 317 0.44% 916 1.86% 24 610 0.84% 233 0.47% 25 3128 4.29% 95 0.19% 26 63902 87.69% 122 0.25% 27 619 0.85% 286 0.58% 28 0 0.00% 3536 7.19% 29 0 0.00% 903 1.83% 30 0 0.00% 0 0.00% I've done similar experiments with 2.6.35 kernel on smaller boxes as well. One is on a dual-socket Westmere box (12 cores total, with HT). Another experiment is on an old dual-socket Core 2 box (4 cores total, no HT) On the 12-core Westmere box, I see a 250% increase for Ingo's mutex-test program with my mutex patch but no significant difference in aim7's fserver workload. On the 4-core Core 2 box, I see the difference with the patch for both mutex-test and aim7 fserver are negligible. So far, it seems like the patch has not caused regression on smaller systems. Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> # .35.x LKML-Reference: <1282168827.9542.72.camel@schen9-DESK> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Stephane reported that when the machine locks up, the regular ticks, which are responsible to resetting the throttle count, stop too. Hence the NMI watchdog can end up being throttled before it reports on the locked up state, and we end up being sad.. Cure this by having the watchdog overflow reset its own throttle count. Reported-by: NStephane Eranian <eranian@google.com> Tested-by: NStephane Eranian <eranian@google.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282215916.1926.4696.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 22 8月, 2010 1 次提交
-
-
由 Arjan van de Ven 提交于
With the introduction of the new unified work queue thread pools, we lost one feature: It's no longer possible to know which worker is causing the CPU to wake out of idle. The result is that PowerTOP now reports a lot of "kworker/a:b" instead of more readable results. This patch adds a pair of tracepoints to the new workqueue code, similar in style to the timer/hrtimer tracepoints. With this pair of tracepoints, the next PowerTOP can correctly report which work item caused the wakeup (and how long it took): Interrupt (43) i915 time 3.51ms wakeups 141 Work ieee80211_iface_work time 0.81ms wakeups 29 Work do_dbs_timer time 0.55ms wakeups 24 Process Xorg time 21.36ms wakeups 4 Timer sched_rt_period_timer time 0.01ms wakeups 1 Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 8月, 2010 2 次提交
-
-
由 Linus Torvalds 提交于
It's a really simple list, and several of the users want to go backwards in it to find the previous vma. So rather than have to look up the previous entry with 'find_vma_prev()' or something similar, just make it doubly linked instead. Tested-by: NIan Campbell <ijc@hellion.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Righi 提交于
kfifo_skip() is currently broken, due to the missing of the internal helper function. Add it. Signed-off-by: NAndrea Righi <arighi@develer.com> Cc: Greg KH <greg@kroah.com> Acked-by: NStefani Seibold <stefani@seibold.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 8月, 2010 5 次提交
-
-
由 Namhyung Kim 提交于
ftrace_event_call->perf_events, perf_trace_buf, fgraph_data->cpu_data and some local variables are percpu pointers missing __percpu markups. Add them. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Acked-by: NTejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1281498479-28551-1-git-send-email-namhyung@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
Instead of hardcoding the number of contexts for the recursions barriers, define a cpp constant to make the code more self-explanatory. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com>
-
由 Frederic Weisbecker 提交于
Now that software events don't have interrupt disabled anymore in the event path, callchains can nest on any context. So seperating nmi and others contexts in two buffers has become racy. Fix this by providing one buffer per nesting level. Given the size of the callchain entries (2040 bytes * 4), we now need to allocate them dynamically. v2: Fixed put_callchain_entry call after recursion. Fix the type of the recursion, it must be an array. v3: Use a manual pr cpu allocation (temporary solution until NMIs can safely access vmalloc'ed memory). Do a better separation between callchain reference tracking and allocation. Make the "put" path lockless for non-release cases. v4: Protect the callchain buffers with rcu. v5: Do the cpu buffers allocations node affine. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Tested-by: NWill Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David Miller <davem@davemloft.net> Cc: Borislav Petkov <bp@amd64.org>
-
由 Frederic Weisbecker 提交于
Store the kernel and user contexts from the generic layer instead of archs, this gathers some repetitive code. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPaul Mackerras <paulus@samba.org> Tested-by: NWill Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
-
由 Frederic Weisbecker 提交于
- Most archs use one callchain buffer per cpu, except x86 that needs to deal with NMIs. Provide a default perf_callchain_buffer() implementation that x86 overrides. - Centralize all the kernel/user regs handling and invoke new arch handlers from there: perf_callchain_user() / perf_callchain_kernel() That avoid all the user_mode(), current->mm checks and so... - Invert some parameters in perf_callchain_*() helpers: entry to the left, regs to the right, following the traditional (dst, src). Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPaul Mackerras <paulus@samba.org> Tested-by: NWill Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
-