- 15 9月, 2009 8 次提交
-
-
由 Anirban Sinha 提交于
console_print() is an old legacy interface mostly unused in the entire kernel tree. It's best to clean up its existing use and let developers use their own implementation of it as they feel fit. Signed-off-by: NAnirban Sinha <asinha@zeugmasystems.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
put_cred() will oops if given a NULL groups list, but that is now possible with the existence of cred_alloc_blank(), as used in keyctl_session_to_parent(). Added in commit: commit ee18d64c Author: David Howells <dhowells@redhat.com> Date: Wed Sep 2 09:14:21 2009 +0100 KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] Reported-by: NMarc Dionne <marc.c.dionne@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Wu Fengguang 提交于
Fix the definition of BM_BITS_PER_BLOCK and kerneldoc description of create_bm_block_list(). [rjw: Added changelog.] Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Gerald Schaefer 提交于
Use for_each_populated_zone() instead of for_each_zone() in hibernation code. This fixes a bug on s390, where we allow both config options HIBERNATION and MEMORY_HOTPLUG, so that we also have a ZONE_MOVABLE here. We only allow hibernation if no memory hotplug operation was performed, so in fact both features can only be used exclusively, but this way we don't need 2 differently configured (distribution) kernels. If we have an unpopulated ZONE_MOVABLE, we allow hibernation but run into a BUG_ON() in memory_bm_test/set/clear_bit() because hibernation code iterates through all zones, not only the populated zones, in several places. For example, swsusp_free() does for_each_zone() and then checks for pfn_valid(), which is true even if the zone is not populated, resulting in a BUG_ON() later because the pfn cannot be found in the memory bitmap. Replacing all occurences of for_each_zone() in hibernation code with for_each_populated_zone() would fix this issue. [rjw: Rebased on top of linux-next hibernation patches.] Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Rafael J. Wysocki 提交于
We want to avoid attempting to free too much memory too hard during hibernation, so estimate the minimum size of the image to use as the lower limit for preallocating memory. The approach here is based on the (experimental) observation that we can't free more page frames than the sum of: * global_page_state(NR_SLAB_RECLAIMABLE) * global_page_state(NR_ACTIVE_ANON) * global_page_state(NR_INACTIVE_ANON) * global_page_state(NR_ACTIVE_FILE) * global_page_state(NR_INACTIVE_FILE) minus * global_page_state(NR_FILE_MAPPED) Namely, if this number is subtracted from the number of saveable pages in the system, we get a good estimate of the minimum reasonable size of a hibernation image. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NWu Fengguang <fengguang.wu@intel.com>
-
由 Rafael J. Wysocki 提交于
Since the hibernation code is now going to use allocations of memory to make enough room for the image, it can also use the page frames allocated at this stage as image page frames. The low-level hibernation code needs to be rearranged for this purpose, but it allows us to avoid freeing a great number of pages and allocating these same pages once again later, so it generally is worth doing. [rev. 2: Take highmem into account correctly.] Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> -
由 Rafael J. Wysocki 提交于
Rework swsusp_shrink_memory() so that it calls shrink_all_memory() just once to make some room for the image and then allocates memory to apply more pressure to the memory management subsystem, if necessary. Unfortunately, we don't seem to be able to drop shrink_all_memory() entirely just yet, because that would lead to huge performance regressions in some test cases. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NPavel Machek <pavel@ucw.cz>
-
Although the same label name is used somewhere else in the file, this particular label was consistently typoed in all of its uses. Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 11 9月, 2009 4 次提交
-
-
由 Jens Axboe 提交于
This borrows some code from NAPI and implements a polled completion mode for block devices. The idea is the same as NAPI - instead of doing the command completion when the irq occurs, schedule a dedicated softirq in the hopes that we will complete more IO when the iopoll handler is invoked. Devices have a budget of commands assigned, and will stay in polled mode as long as they continue to consume their budget from the iopoll softirq handler. If they do not, the device is set back to interrupt completion mode. This patch holds the core bits for blk-iopoll, device driver support sold separately. Signed-off-by: NJens Axboe <jens.axboe@oracle.com> -
由 Jens Axboe 提交于
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: NJens Axboe <jens.axboe@oracle.com> -
由 Ingo Molnar 提交于
This weird perf trace output: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] Is caused by setting one component field of the delta to zero a bit too early. Move it to later. ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug, it's just a reporting bug in essence. ) Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nikos Chantziaras <realnc@arcor.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Nikos Chantziaras and Jens Axboe reported that turning off NEW_FAIR_SLEEPERS improves desktop interactivity visibly. Nikos described his experiences the following way: " With this setting, I can do "nice -n 19 make -j20" and still have a very smooth desktop and watch a movie at the same time. Various other annoyances (like the "logout/shutdown/restart" dialog of KDE not appearing at all until the background fade-out effect has finished) are also gone. So this seems to be the single most important setting that vastly improves desktop behavior, at least here. " Jens described it the following way, referring to a 10-seconds xmodmap scheduling delay he was trying to debug: " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then I get: Performance counter stats for 'xmodmap .xmodmap-carl': 9.009137 task-clock-msecs # 0.447 CPUs 18 context-switches # 0.002 M/sec 1 CPU-migrations # 0.000 M/sec 315 page-faults # 0.035 M/sec 0.020167093 seconds time elapsed Woot! " So disable it for now. In perf trace output i can see weird delta timestamps: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] That nsec field is not supposed to be that large. More digging is needed - but lets turn it off while the real bug is found. Reported-by: NNikos Chantziaras <realnc@arcor.de> Tested-by: NNikos Chantziaras <realnc@arcor.de> Reported-by: NJens Axboe <jens.axboe@oracle.com> Tested-by: NJens Axboe <jens.axboe@oracle.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 9月, 2009 3 次提交
-
-
由 Mike Galbraith 提交于
Removes kthread/workqueue priority boost, they increase worst-case desktop latencies. Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Reduce the latency target from 20 msecs to 5 msecs. Why? Larger latencies increase spread, which is good for scaling, but bad for worst case latency. We still have the ilog(nr_cpus) rule to scale up on bigger server boxes. Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Set child_runs_first default to off. It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs get preempted by child tasks, reducing parallelism. Note, this patch might make existing races in user applications more prominent than before - so breakages might be bisected to this commit. Child-runs-first is broken on SMP to begin with, and we already had it off briefly in v2.6.23 so most of the offenders ought to be fixed. Would be nice not to revert this commit but fix those apps finally ... Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case people want to work around broken apps. ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 08 9月, 2009 3 次提交
-
-
由 Mike Galbraith 提交于
A fork/exec load is usually "pass the baton", so the child should never be placed behind the parent. With START_DEBIT we make room for the new task, but with child_runs_first, that room comes out of the _parent's_ hide. There's nothing to say that the parent wasn't ahead of min_vruntime at fork() time, which means that the "baton carrier", who is essentially the parent in drag, can gain time and increase scheduling latencies for waiters. With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first enabled, we essentially pass the sleeper fairness off to the child, which is fine, but if we don't base placement on the parent's updated vruntime, we can end up compounding latency woes if the child itself then does fork/exec. The debit incurred at fork doesn't hurt the parent who is then going to sleep and maybe exit, but the child who acquires the error harms all comers. This improves latencies of make -j<n> kernel build workloads. Reported-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
select_task_rq_fair() incorrectly skips the wake_affine() logic, remove this. When prev_cpu == this_cpu, the code jumps straight to the wake_idle() logic, this doesn't give the wake_affine() logic the chance to pin the task to this cpu. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 9月, 2009 1 次提交
-
-
由 Jaswinder Singh Rajput 提交于
fix the following 'make includecheck' warning: kernel/sysctl.c: linux/security.h is included more than once. Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 05 9月, 2009 10 次提交
-
-
由 Steven Rostedt 提交于
Since the ability to swap the cpu buffers adds a small overhead to the recording of a trace, we only want to add it when needed. Only the irqsoff and preemptoff tracers use this feature, and both are not recommended for production kernels. This patch disables its use when neither irqsoff nor preemptoff is configured. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
Because the irqsoff tracer can swap an internal CPU buffer, it is possible that a swap happens between the start of the write and before the committing bit is set (the committing bit will disable swapping). This patch adds a check for this and will fail the write if it detects it. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
The irqsoff tracer will fail to swap the cpu buffer with the max buffer if it preempts a commit. Instead of ignoring this, this patch makes the tracer report it if the last max latency failed due to preempting a current commit. The output of the latency tracer will look like this: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 2.6.31-rc5 # -------------------------------------------------------------------- # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: save_args # => ended at: __do_softirq # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress Note the latency time and the functions that disabled the irqs or preemption will still be listed. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
This patch adds a trace_array_printk to allow a tracer to use the trace_printk on its own trace array. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
The latency tracers (irqsoff and wakeup) can swap trace buffers on the fly. If an event is happening and has reserved data on one of the buffers, and the latency tracer swaps the global buffer with the max buffer, the result is that the event may commit the data to the wrong buffer. This patch changes the API to the trace recording to be recieve the buffer that was used to reserve a commit. Then this buffer can be passed in to the commit. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
Reseting the trace buffer without first disabling the buffer and waiting for any writers to complete, can corrupt the ring buffer. This patch makes the external version of tracing_reset safe from corruption by disabling the ring buffer and calling synchronize_sched. This version can no longer be called from interrupt context. But all those callers have been removed. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
Currently the latency tracers reset the ring buffer. Unfortunately if a commit is in process (due to a trace event), this can corrupt the ring buffer. When this happens, the ring buffer will detect the corruption and then permanently disable the ring buffer. The bug does not crash the system, but it does prevent further tracing after the bug is hit. Instead of reseting the trace buffers, the timestamp of the start of the trace is used instead. The buffers will still contain the previous data, but the output will not count any data that is before the timestamp of the trace. Note, this only affects the static trace output (trace) and not the runtime trace output (trace_pipe). The runtime trace output does not make sense for the latency tracers anyway. Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Li Zefan 提交于
The predicates of an event and their filter structure are allocated when we create an event filter for the first time. These objects must be created once but each time we come with a new filter, we overwrite such pre-existing allocation, if any. Thus, this patch checks if the filter has already been allocated before going ahead. Spotted-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <4A9CB1BA.3060402@cn.fujitsu.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Steven Rostedt 提交于
The function tracing_reset is deprecated for outside use of trace.c. The new function to reset the the buffers is tracing_reset_online_cpus. The reason for this is that resetting the buffers while the event trace points are active can corrupt the buffers, because they may be writing at the time of reset. The tracing_reset_online_cpus disables writes and waits for current writers to finish. This patch replaces all users of tracing_reset except for the latency tracers. Those changes require more work and will be removed in the following patches. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
Resetting the ring buffers while traces are happening can corrupt the ring buffer and disable it (no kernel crash to worry about). The safest thing to do is disable the ring buffers, call synchronize_sched() to wait for all current writers to finish and then reset the buffer. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 04 9月, 2009 11 次提交
-
-
由 Steven Rostedt 提交于
When reading the tracer from the trace file, updating the max latency may corrupt the output. This patch disables the tracing of the max latency while reading the trace file. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
During development of the tracer, we would copy information from the live tracer to the max tracer with one memcpy. Since then we added a generic ring buffer and we handle the copies differently now. Unfortunately, we never copied the critical section information, and we lost the output: # => started at: kmem_cache_alloc # => ended at: kmem_cache_alloc This patch adds back the critical start and end copying as well as removes the unused "trace_idx" and "overrun" fields of the trace_array_cpu structure. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
Currently the way RB_WARN_ON works, is to disable either the current CPU buffer or all CPU buffers, depending on whether a ring_buffer or ring_buffer_per_cpu struct was passed into the macro. Most users of the RB_WARN_ON pass in the CPU buffer, so only the one CPU buffer gets disabled but the rest are still active. This may confuse users even though a warning is sent to the console. This patch changes the macro to disable the entire buffer even if the CPU buffer is passed in. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
The latency tracers report the number of items in the trace buffer. This uses the ring buffer data to calculate this. Because discarded events are also counted, the numbers do not match the number of items that are printed. The ring buffer also adds a "padding" item to the end of each buffer page which also gets counted as a discarded item. This patch decrements the counter to the page entries on a discard. This allows us to ignore discarded entries while reading the buffer. Decrementing the counter is still safe since it can only happen while the committing flag is still set. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
The function ring_buffer_event_discard can be used on any item in the ring buffer, even after the item was committed. This function provides no safety nets and is very race prone. An item may be safely removed from the ring buffer before it is committed with the ring_buffer_discard_commit. Since there are currently no users of this function, and because this function is racey and error prone, this patch removes it altogether. Note, removing this function also allows the counters to ignore all discarded events (patches will follow). Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
When the ring buffer uses an iterator (static read mode, not on the fly reading), when it crosses a page boundery, it will skip the first entry on the next page. The reason is that the last entry of a page is usually padding if the page is not full. The padding will not be returned to the user. The problem arises on ring_buffer_read because it also increments the iterator. Because both the read and peek use the same rb_iter_peek, the rb_iter_peak will return the padding but also increment to the next item. This is because the ring_buffer_peek will not incerment it itself. The ring_buffer_read will increment it again and then call rb_iter_peek again to get the next item. But that will be the second item, not the first one on the page. The reason this never showed up before, is because the ftrace utility always calls ring_buffer_peek first and only uses ring_buffer_read to increment to the next item. The ring_buffer_peek will always keep the pointer to a valid item and not padding. This just hid the bug. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
The loops in the ring buffer that use cpu_relax are not dependent on other CPUs. They simply came across some padding in the ring buffer and are skipping over them. It is a normal loop and does not require a cpu_relax. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
If a commit is taking place on a CPU ring buffer, do not allow it to be swapped. Return -EBUSY when this is detected instead. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Steven Rostedt 提交于
The callers of reset must ensure that no commit can be taking place at the time of the reset. If it does then we may corrupt the ring buffer. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> -
由 Ingo Molnar 提交于
This crash: [ 1774.088275] divide error: 0000 [#1] SMP [ 1774.100355] CPU 13 [ 1774.102498] Modules linked in: [ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN [ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4 Triggers because update_group_power() modifies the sd tree and does temporary calculations there - not considering that other CPUs could observe intermediate values, such as the zero initial value. Calculate it in a temporary variable instead. (we need no memory barrier as these are all statistical values anyway) Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090904092742.GA11014@elte.hu> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Its a source of fail, also, now that cpu_power is dynamical, its a waste of time. before: <idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1 after: bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1 [ v2: build fix from From: Andreas Herrmann ] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Acked-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Acked-by: NGautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.425896304@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-