- 15 9月, 2009 16 次提交
-
-
由 Peter Zijlstra 提交于
I suspect a feed-back loop between cpuidle and the aperf/mperf cpu_power bits, where when we have idle C-states lower the ratio, which leads to lower cpu_power and then less load, which generates more idle time, etc.. Put in a knob to disable it. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Provide an ach specific hook for cpufreq based scaling of cpu_power. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> [ego@in.ibm.com: spotting bugs] LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
While merging select_task_rq_fair() and sched_balance_self() I made a mistake that leads to testing the wrong task affinty. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
for_each_domain() uses RCU to serialize the sched_domains, except it doesn't actually use rcu_read_lock() and instead relies on disabling preemption -> FAIL. XXX: audit other sched_domain code. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
One of the problems of power-saving balancing is that under certain scenarios it is too slow and allows tons of real work to pile up. Avoid this by ignoring the powersave stuff when there's real work to be done. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
We're going to want to drop rq->lock in try_to_wake_up() for a longer period of time, however we also want to deal with concurrent waking of the same task, which is currently handled by holding rq->lock. So introduce a new TASK state, namely TASK_WAKING, which indicates someone is already waking the task (other wakers will fail p->state & state). We also keep preemption disabled over the whole ttwu(). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Rather ugly patch to fully place the sched_balance_self() code inside the fair class. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Move the sched_balance_self() code into sched_fair.c This facilitates the merger of sched_balance_self() and sched_fair::select_task_rq(). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
In preparation to other code movement, move weighted_cpuload(), source_load() and target_load() before the class includes. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Add text... Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Add a NEXT_BUDDY feature flag to aid in debugging. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
It consists of two conditions, split them out in separate toggles so we can test them independently. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 9月, 2009 3 次提交
-
-
由 Jens Axboe 提交于
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Ingo Molnar 提交于
This weird perf trace output: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] Is caused by setting one component field of the delta to zero a bit too early. Move it to later. ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug, it's just a reporting bug in essence. ) Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nikos Chantziaras <realnc@arcor.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Nikos Chantziaras and Jens Axboe reported that turning off NEW_FAIR_SLEEPERS improves desktop interactivity visibly. Nikos described his experiences the following way: " With this setting, I can do "nice -n 19 make -j20" and still have a very smooth desktop and watch a movie at the same time. Various other annoyances (like the "logout/shutdown/restart" dialog of KDE not appearing at all until the background fade-out effect has finished) are also gone. So this seems to be the single most important setting that vastly improves desktop behavior, at least here. " Jens described it the following way, referring to a 10-seconds xmodmap scheduling delay he was trying to debug: " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then I get: Performance counter stats for 'xmodmap .xmodmap-carl': 9.009137 task-clock-msecs # 0.447 CPUs 18 context-switches # 0.002 M/sec 1 CPU-migrations # 0.000 M/sec 315 page-faults # 0.035 M/sec 0.020167093 seconds time elapsed Woot! " So disable it for now. In perf trace output i can see weird delta timestamps: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] That nsec field is not supposed to be that large. More digging is needed - but lets turn it off while the real bug is found. Reported-by: NNikos Chantziaras <realnc@arcor.de> Tested-by: NNikos Chantziaras <realnc@arcor.de> Reported-by: NJens Axboe <jens.axboe@oracle.com> Tested-by: NJens Axboe <jens.axboe@oracle.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 9月, 2009 3 次提交
-
-
由 Mike Galbraith 提交于
Removes kthread/workqueue priority boost, they increase worst-case desktop latencies. Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Reduce the latency target from 20 msecs to 5 msecs. Why? Larger latencies increase spread, which is good for scaling, but bad for worst case latency. We still have the ilog(nr_cpus) rule to scale up on bigger server boxes. Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Set child_runs_first default to off. It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs get preempted by child tasks, reducing parallelism. Note, this patch might make existing races in user applications more prominent than before - so breakages might be bisected to this commit. Child-runs-first is broken on SMP to begin with, and we already had it off briefly in v2.6.23 so most of the offenders ought to be fixed. Would be nice not to revert this commit but fix those apps finally ... Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case people want to work around broken apps. ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 08 9月, 2009 3 次提交
-
-
由 Mike Galbraith 提交于
A fork/exec load is usually "pass the baton", so the child should never be placed behind the parent. With START_DEBIT we make room for the new task, but with child_runs_first, that room comes out of the _parent's_ hide. There's nothing to say that the parent wasn't ahead of min_vruntime at fork() time, which means that the "baton carrier", who is essentially the parent in drag, can gain time and increase scheduling latencies for waiters. With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first enabled, we essentially pass the sleeper fairness off to the child, which is fine, but if we don't base placement on the parent's updated vruntime, we can end up compounding latency woes if the child itself then does fork/exec. The debit incurred at fork doesn't hurt the parent who is then going to sleep and maybe exit, but the child who acquires the error harms all comers. This improves latencies of make -j<n> kernel build workloads. Reported-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
select_task_rq_fair() incorrectly skips the wake_affine() logic, remove this. When prev_cpu == this_cpu, the code jumps straight to the wake_idle() logic, this doesn't give the wake_affine() logic the chance to pin the task to this cpu. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 9月, 2009 1 次提交
-
-
由 Jaswinder Singh Rajput 提交于
fix the following 'make includecheck' warning: kernel/sysctl.c: linux/security.h is included more than once. Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 05 9月, 2009 10 次提交
-
-
由 Steven Rostedt 提交于
Since the ability to swap the cpu buffers adds a small overhead to the recording of a trace, we only want to add it when needed. Only the irqsoff and preemptoff tracers use this feature, and both are not recommended for production kernels. This patch disables its use when neither irqsoff nor preemptoff is configured. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Because the irqsoff tracer can swap an internal CPU buffer, it is possible that a swap happens between the start of the write and before the committing bit is set (the committing bit will disable swapping). This patch adds a check for this and will fail the write if it detects it. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
The irqsoff tracer will fail to swap the cpu buffer with the max buffer if it preempts a commit. Instead of ignoring this, this patch makes the tracer report it if the last max latency failed due to preempting a current commit. The output of the latency tracer will look like this: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 2.6.31-rc5 # -------------------------------------------------------------------- # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: save_args # => ended at: __do_softirq # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress Note the latency time and the functions that disabled the irqs or preemption will still be listed. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
This patch adds a trace_array_printk to allow a tracer to use the trace_printk on its own trace array. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
The latency tracers (irqsoff and wakeup) can swap trace buffers on the fly. If an event is happening and has reserved data on one of the buffers, and the latency tracer swaps the global buffer with the max buffer, the result is that the event may commit the data to the wrong buffer. This patch changes the API to the trace recording to be recieve the buffer that was used to reserve a commit. Then this buffer can be passed in to the commit. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Reseting the trace buffer without first disabling the buffer and waiting for any writers to complete, can corrupt the ring buffer. This patch makes the external version of tracing_reset safe from corruption by disabling the ring buffer and calling synchronize_sched. This version can no longer be called from interrupt context. But all those callers have been removed. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Currently the latency tracers reset the ring buffer. Unfortunately if a commit is in process (due to a trace event), this can corrupt the ring buffer. When this happens, the ring buffer will detect the corruption and then permanently disable the ring buffer. The bug does not crash the system, but it does prevent further tracing after the bug is hit. Instead of reseting the trace buffers, the timestamp of the start of the trace is used instead. The buffers will still contain the previous data, but the output will not count any data that is before the timestamp of the trace. Note, this only affects the static trace output (trace) and not the runtime trace output (trace_pipe). The runtime trace output does not make sense for the latency tracers anyway. Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Li Zefan 提交于
The predicates of an event and their filter structure are allocated when we create an event filter for the first time. These objects must be created once but each time we come with a new filter, we overwrite such pre-existing allocation, if any. Thus, this patch checks if the filter has already been allocated before going ahead. Spotted-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <4A9CB1BA.3060402@cn.fujitsu.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Steven Rostedt 提交于
The function tracing_reset is deprecated for outside use of trace.c. The new function to reset the the buffers is tracing_reset_online_cpus. The reason for this is that resetting the buffers while the event trace points are active can corrupt the buffers, because they may be writing at the time of reset. The tracing_reset_online_cpus disables writes and waits for current writers to finish. This patch replaces all users of tracing_reset except for the latency tracers. Those changes require more work and will be removed in the following patches. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Resetting the ring buffers while traces are happening can corrupt the ring buffer and disable it (no kernel crash to worry about). The safest thing to do is disable the ring buffers, call synchronize_sched() to wait for all current writers to finish and then reset the buffer. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 04 9月, 2009 4 次提交
-
-
由 Steven Rostedt 提交于
When reading the tracer from the trace file, updating the max latency may corrupt the output. This patch disables the tracing of the max latency while reading the trace file. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
During development of the tracer, we would copy information from the live tracer to the max tracer with one memcpy. Since then we added a generic ring buffer and we handle the copies differently now. Unfortunately, we never copied the critical section information, and we lost the output: # => started at: kmem_cache_alloc # => ended at: kmem_cache_alloc This patch adds back the critical start and end copying as well as removes the unused "trace_idx" and "overrun" fields of the trace_array_cpu structure. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Currently the way RB_WARN_ON works, is to disable either the current CPU buffer or all CPU buffers, depending on whether a ring_buffer or ring_buffer_per_cpu struct was passed into the macro. Most users of the RB_WARN_ON pass in the CPU buffer, so only the one CPU buffer gets disabled but the rest are still active. This may confuse users even though a warning is sent to the console. This patch changes the macro to disable the entire buffer even if the CPU buffer is passed in. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
The latency tracers report the number of items in the trace buffer. This uses the ring buffer data to calculate this. Because discarded events are also counted, the numbers do not match the number of items that are printed. The ring buffer also adds a "padding" item to the end of each buffer page which also gets counted as a discarded item. This patch decrements the counter to the page entries on a discard. This allows us to ignore discarded entries while reading the buffer. Decrementing the counter is still safe since it can only happen while the committing flag is still set. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-