- 06 8月, 2006 2 次提交
-
-
由 Christian Borntraeger 提交于
This patch adds a barrier() in futex unqueue_me to avoid aliasing of two pointers. On my s390x system I saw the following oops: Unable to handle kernel pointer dereference at virtual kernel address 0000000000000000 Oops: 0004 [#1] CPU: 0 Not tainted Process mytool (pid: 13613, task: 000000003ecb6ac0, ksp: 00000000366bdbd8) Krnl PSW : 0704d00180000000 00000000003c9ac2 (_spin_lock+0xe/0x30) Krnl GPRS: 00000000ffffffff 000000003ecb6ac0 0000000000000000 0700000000000000 0000000000000000 0000000000000000 000001fe00002028 00000000000c091f 000001fe00002054 000001fe00002054 0000000000000000 00000000366bddc0 00000000005ef8c0 00000000003d00e8 0000000000144f91 00000000366bdcb8 Krnl Code: ba 4e 20 00 12 44 b9 16 00 3e a7 84 00 08 e3 e0 f0 88 00 04 Call Trace: ([<0000000000144f90>] unqueue_me+0x40/0xe4) [<0000000000145a0c>] do_futex+0x33c/0xc40 [<000000000014643e>] sys_futex+0x12e/0x144 [<000000000010bb00>] sysc_noemu+0x10/0x16 [<000002000003741c>] 0x2000003741c The code in question is: static int unqueue_me(struct futex_q *q) { int ret = 0; spinlock_t *lock_ptr; /* In the common case we don't take the spinlock, which is nice. */ retry: lock_ptr = q->lock_ptr; if (lock_ptr != 0) { spin_lock(lock_ptr); /* * q->lock_ptr can change between reading it and * spin_lock(), causing us to take the wrong lock. This * corrects the race condition. [...] and my compiler (gcc 4.1.0) makes the following out of it: 00000000000003c8 <unqueue_me>: 3c8: eb bf f0 70 00 24 stmg %r11,%r15,112(%r15) 3ce: c0 d0 00 00 00 00 larl %r13,3ce <unqueue_me+0x6> 3d0: R_390_PC32DBL .rodata+0x2a 3d4: a7 f1 1e 00 tml %r15,7680 3d8: a7 84 00 01 je 3da <unqueue_me+0x12> 3dc: b9 04 00 ef lgr %r14,%r15 3e0: a7 fb ff d0 aghi %r15,-48 3e4: b9 04 00 b2 lgr %r11,%r2 3e8: e3 e0 f0 98 00 24 stg %r14,152(%r15) 3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11) /* write q->lock_ptr in r12 */ 3f4: b9 02 00 cc ltgr %r12,%r12 3f8: a7 84 00 4b je 48e <unqueue_me+0xc6> /* if r12 is zero then jump over the code.... */ 3fc: e3 20 b0 28 00 04 lg %r2,40(%r11) /* write q->lock_ptr in r2 */ 402: c0 e5 00 00 00 00 brasl %r14,402 <unqueue_me+0x3a> 404: R_390_PC32DBL _spin_lock+0x2 /* use r2 as parameter for spin_lock */ So the code becomes more or less: if (q->lock_ptr != 0) spin_lock(q->lock_ptr) instead of if (lock_ptr != 0) spin_lock(lock_ptr) Which caused the oops from above. After adding a barrier gcc creates code without this problem: [...] (the same) 3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11) 3f4: b9 02 00 cc ltgr %r12,%r12 3f8: b9 04 00 2c lgr %r2,%r12 3fc: a7 84 00 48 je 48c <unqueue_me+0xc4> 400: c0 e5 00 00 00 00 brasl %r14,400 <unqueue_me+0x38> 402: R_390_PC32DBL _spin_lock+0x2 As a general note, this code of unqueue_me seems a bit fishy. The retry logic of unqueue_me only works if we can guarantee, that the original value of q->lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We know that q->lock_ptr can change. I dont know what happens with the original spinlock, as I am not an expert with the futex code. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: NIngo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@timesys.com> Signed-off-by: NChristian Borntraeger <borntrae@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rafael J. Wysocki 提交于
It should be possible to suspend, either to RAM or to disk, if there's a traced process that has just reached a breakpoint. However, this is a special case, because its parent process might have been frozen already and then we are unable to deliver the "freeze" signal to the traced process. If this happens, it's better to cancel the freezing of the traced process. Ref. http://bugzilla.kernel.org/show_bug.cgi?id=6787Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 03 8月, 2006 9 次提交
-
-
由 Al Viro 提交于
move that stuff downstream and into the only branch where it'll be used. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Amy Griffis 提交于
Michael C Thompson wrote: [Tue Aug 01 2006, 02:36:36PM EDT] > The trigger for this oops is: > # auditctl -a exit,always -S pread64 -F 'inode<1' Setting the err value will fix it. Signed-off-by: NAmy Griffis <amy.griffis@hp.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Amy Griffis 提交于
Always initialize the audit_inode_hash[] so we don't oops on list rules. Signed-off-by: NAmy Griffis <amy.griffis@hp.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Amy Griffis 提交于
When an object is created via a symlink into an audited directory, audit misses the event due to not having collected the inode data for the directory. Modify __audit_inode_child() to copy the parent inode data if a parent wasn't found in audit_names[]. Signed-off-by: NAmy Griffis <amy.griffis@hp.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Amy Griffis 提交于
When the specified path is an existing file or when it is a symlink, audit collects the wrong inode number, which causes it to miss the open() event. Adding a second hook to the open() path fixes this. Also add audit_copy_inode() to consolidate some code. Signed-off-by: NAmy Griffis <amy.griffis@hp.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Linus Torvalds 提交于
Suresh points out that commit b0423a0d broke the semantics of a synchronous signal like SIGSEGV occurring recursively inside its own handler handler (or, indeed, any other context when the signal was blocked). That was unintentional, and this fixes things up by reinstating the old semantics, but without reverting the cleanups. Cc: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 01 8月, 2006 14 次提交
-
-
由 Josh Triplett 提交于
kernel/timer.c defines a (per-cpu) pointer to tvec_base_t, but initializes it using { &a_tvec_base_t }, which sparse warns about; change this to just &a_tvec_base_t. Signed-off-by: NJosh Triplett <josh@freedesktop.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Steven Rostedt 提交于
In order to prevent Doc Rot, this patch adds a reference to the design document for rtmutex.c in rtmutex.c. So when someone needs to update or change the design of that file they will know that a document actually exists that explains the design (helping them change it), and hopefully that they will update the document if they too change the design. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Tim Chen 提交于
The recent changes from irqtrace feature has added overheads to local_bh_disable and local_bh_enable that reduces UDP performance across x86_64 and IA64, even though IA64 does not support the irqtrace feature. Patch in question is [PATCH]lockdep: irqtrace subsystem, core http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=c ommit;h=de30a2b3 Prior to this patch, local_bh_disable was a short macro. Now it is a function which calls __local_bh_disable with added irq flags save and restore. The irq flags save and restore were also added to local_bh_enable, probably for injecting the trace irqs code. This overhead is on the generic code path across all architectures. On a IA_64 test machine (Itanium-2 1.6 GHz) running a benchmark like netperf's UDP streaming test, the added overhead results in a drop of 3% in throughput, as udp_sendmsg calls the local_bh_enable/disable several times. Other workloads that have heavy usages of local_bh_enable/disable could also be affected. The patch ideally should not have affected IA-64 performance as it does not have IRQ tracing support. A significant portion of the overhead is in the added irq flags save and restore, which I think is not needed if IRQ tracing is unused. A suggested patch is attached below that recovers the lost performance. However, the "ifdef"s in the patch are a bit ugly. Signed-off-by: NTim Chen <tim.c.chen@intel.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Heiko Carstens 提交于
Initialize init task's pi_waiters plist. Otherwise cpu hotplug of cpu 0 might crash, since rt_mutex_getprio() accesses an uninitialized list head. call chain which led to crash: take_cpu_down sched_idle_next __setscheduler rt_mutex_getprio Using PLIST_HEAD_INIT in the INIT_TASK macro doesn't work unfortunately, since the pi_waiters member is only conditionally present. Cc: Arjan van de Ven <arjan@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rolf Eike Beer 提交于
kernel/workqueue.c was omitted from generating kernel documentation. This adds a new section "Workqueues and Kevents" and adds documentation for some of the functions. Some functions in this file already had DocBook-style comments, now they finally become visible. Signed-off-by: NRolf Eike Beer <eike-kernel@sf-tec.de> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jim Houston 提交于
In cond_resched_lock() it calls __resched_legal() before dropping the spin lock. __resched_legal() will always finds the preempt_count non-zero and will prevent the call to __cond_resched(). The attached patch adds a parameter to __resched_legal() with the expected preempt_count value. Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Steven Rostedt 提交于
We have #define INDEX(N) (base->timer_jiffies >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK and it's used via list = varray[i + 1]->vec + (INDEX(i + 1)); So, due to underparenthesisation, this INDEX(i+1) is now a ... (TVR_BITS + i + 1 * TVN_BITS)) ... So this bugfix changes behaviour. It worked before by sheer luck: "If i was anything but 0, it was broken. But this was only used by s390 and arm. Since it was for the next interrupt, could that next interrupt be a problem (going into the second cascade)? But it was probably seldom wrong. That is, this would fail if the next interrupt was in the second cascade, and was wrapped. Which may never of happened. Also if it did happen, it would have just missed the interrupt. If an interrupt was missed, and no one was there to miss it, was it really missed :-)" Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Chandra Seetharaman 提交于
Few of the callback functions and notifier blocks that are associated with cpu notifications incorrectly have __devinit and __devinitdata. They should be __cpuinit and __cpuinitdata instead. It makes no functional difference but wastes text area when CONFIG_HOTPLUG is enabled and CONFIG_HOTPLUG_CPU is not. This patch fixes all those instances. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 bibo, mao 提交于
Kprobe inserts breakpoint instruction in probepoint and then jumps to instruction slot when breakpoint is hit, the instruction slot icache must be consistent with dcache. Here is the patch which invalidates instruction slot icache area. Without this patch, in some machines there will be fault when executing instruction slot where icache content is inconsistent with dcache. Signed-off-by: Nbibo,mao <bibo.mao@intel.com> Acked-by: N"Luck, Tony" <tony.luck@intel.com> Acked-by: NKeshavamurthy Anil S <anil.s.keshavamurthy@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Enable delay accounting by default so that feature gets coverage testing without requiring special measures. Earlier, it was off by default and had to be enabled via a boot time param. This patch reverses the default behaviour to improve coverage testing. It can be removed late in the kernel development cycle if its believed users shouldn't have to incur any cost if they don't want delay accounting. Or it can be retained forever if the utility of the stats is deemed common enough to warrant keeping the feature on. Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Add a missing freeing of skb in the case there are no listeners at all. Also remove the returning of error values by the function as it is unused by the sole caller. Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Complete the separation of delay accounting and taskstats by ignoring the return value of delay accounting functions that fill in parts of taskstats before it is sent out (either in response to a command or as part of a task exit). Also make delayacct_add_tsk return silently when delay accounting is turned off rather than treat it as an error. Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Brownell 提交于
IRQs need refcounting and a state flag to track whether the the IRQ should be enabled or disabled as a "normal IRQ" source after a series of calls to {en,dis}able_irq(). For shared IRQs, the IRQ must be enabled so long as at least one driver needs it active. Likewise, IRQs need the same support to track whether the IRQ should be enabled or disabled as a "wakeup event" source after a series of calls to {en,dis}able_irq_wake(). For shared IRQs, the IRQ must be enabled as a wakeup source during sleep so long as at least one driver needs it. But right now they _don't have_ that refcounting ... which means sharing a wakeup-capable IRQ can't work correctly in some configurations. This patch adds the refcount and flag mechanisms to set_irq_wake() -- which is what {en,dis}able_irq_wake() call -- and minimal documentation of what the irq wake mechanism does. Drivers relying on the older (broken) "toggle" semantics will trigger a warning; that'll be a handful of drivers on ARM systems. Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Siddha, Suresh B 提交于
Use the correct groups while initializing sched groups power for allnodes_domain. This fixes the crash observed while creating exclusive cpusets. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Reported-and-tested-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 29 7月, 2006 2 次提交
-
-
由 Ingo Molnar 提交于
Fix robust PI-futexes to be properly unlocked on unexpected exit. For this to work the kernel has to know whether a futex is a PI or a non-PI one, because the semantics are different. Since the space in relevant glibc data structures is extremely scarce, the best solution is to encode the 'PI' information in bit 0 of the robust list pointer. Existing (non-PI) glibc robust futexes have this bit always zero, so the ABI is kept. New glibc with PI-robust-futexes will set this bit. Further fixes from Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NUlrich Drepper <drepper@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
Fix pi_state->list handling bugs: list handling mishap, locking error. Plus add more debug checks and fix a few style issues i noticed while debugging this. (reported by Ulrich Drepper and Jakub Jelinek.) Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 24 7月, 2006 2 次提交
-
-
由 Paul Jackson 提交于
Fix ABBA deadlock between lock_cpu_hotplug() and the cpuset callback_mutex lock. It only happens on cpu_exclusive cpusets, due to the dynamic sched domain code trying to take the cpu hotplug lock inside the cpuset callback_mutex lock. This bug has apparently been here for several months, but didn't get hit until the right customer load on a large system. This fix appears right from inspection, but it will take a few more days running it on that customers workload to be confident we nailed it. We don't have any other reproducible test case. The cpu_hotplug_lock() tends to cover large runs of code. The other places that hold both that lock and the cpuset callback mutex lock always nest the cpuset lock inside the hotplug lock. This place tries to do the reverse, risking an ABBA deadlock. This is in the cpuset_rmdir() code, where we: * take the callback_mutex lock * mark the cpuset CS_REMOVED * call update_cpu_domains for cpu_exclusive cpusets * in that call, take the cpu_hotplug lock if the cpuset is marked for removal. Thanks to Jack Steiner for identifying this deadlock. The fix is to tear down the dynamic sched domain before we grab the cpuset callback_mutex lock. This way, the two locks are serialized, with the hotplug lock taken and released before trying for the cpuset lock. I suspect that this bug was introduced when I changed the cpuset locking from one lock to two. The dynamic sched domain dependency on cpu_exclusive cpusets and its hotplug hooks were added to this code earlier, when cpusets had only a single lock. It may well have been fine then. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
The CPU hotplug locking was quite messy, with a recursive lock to handle the fact that both the actual up/down sequence wanted to protect itself from being re-entered, but the callbacks that it called also tended to want to protect themselves from CPU events. This splits the lock into two (one to serialize the whole hotplug sequence, the other to protect against the CPU present bitmaps changing). The latter still allows recursive usage because some subsystems (ondemand policy for cpufreq at least) had already gotten too used to the lax locking, but the locking mistakes are hopefully now less fundamental, and we now warn about recursive lock usage when we see it, in the hope that it can be fixed. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 15 7月, 2006 11 次提交
-
-
由 Shailabh Nagar 提交于
In send_cpu_listeners(), which is called on the exit path, a down_write() was protecting operations like skb_clone() and genlmsg_unicast() that do GFP_KERNEL allocations. If the oom-killer decides to kill tasks to satisfy the allocations,the exit of those tasks could block on the same semphore. The down_write() was only needed to allow removal of invalid listeners from the listener list. The patch converts the down_write to a down_read and defers the removal to a separate critical region. This ensures that even if the oom-killer is called, no other task's exit is blocked as it can still acquire another down_read. Thanks to Andrew Morton & Herbert Xu for pointing out the oom related pitfalls, and to Chandra Seetharaman for suggesting this fix instead of using something more complex like RCU. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
On systems with a large number of cpus, with even a modest rate of tasks exiting per cpu, the volume of taskstats data sent on thread exit can overflow a userspace listener's buffers. One approach to avoiding overflow is to allow listeners to get data for a limited and specific set of cpus. By scaling the number of listeners and/or the cpus they monitor, userspace can handle the statistical data overload more gracefully. In this patch, each listener registers to listen to a specific set of cpus by specifying a cpumask. The interest is recorded per-cpu. When a task exits on a cpu, its taskstats data is unicast to each listener interested in that cpu. Thanks to Andrew Morton for pointing out the various scalability and general concerns of previous attempts and for suggesting this design. [akpm@osdl.org: build fix] Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Send per-tgid data only once during exit of a thread group instead of once with each member thread exit. Currently, when a thread exits, besides its per-tid data, the per-tgid data of its thread group is also sent out, if its thread group is non-empty. The per-tgid data sent consists of the sum of per-tid stats for all *remaining* threads of the thread group. This patch modifies this sending in two ways: - the per-tgid data is sent only when the last thread of a thread group exits. This cuts down heavily on the overhead of sending/receiving per-tgid data, especially when other exploiters of the taskstats interface aren't interested in per-tgid stats - the semantics of the per-tgid data sent are changed. Instead of being the sum of per-tid data for remaining threads, the value now sent is the true total accumalated statistics for all threads that are/were part of the thread group. The patch also addresses a minor issue where failure of one accounting subsystem to fill in the taskstats structure was causing the send of taskstats to not be sent at all. The patch has been tested for stability and run cerberus for over 4 hours on an SMP. [akpm@osdl.org: bugfixes] Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Export I/O delays seen by a task through /proc/<tgid>/stats for use in top etc. Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed together with all other I/O here (this is not the case in the netlink interface where the swapin I/O is kept distinct) [akpm@osdl.org: printk warning fix] Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Usage of taskstats interface by delay accounting. Signed-off-by: NShailabh Nagar <nagar@us.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC family), for getting statistics of tasks and thread groups during their lifetime and when they exit. The interface is intended for use by multiple accounting packages though it is being created in the context of delay accounting. This patch creates the interface without populating the fields of the data that is sent to the user in response to a command or upon the exit of a task. Each accounting package interested in using taskstats has to provide an additional patch to add its stats to the common structure. [akpm@osdl.org: cleanups, Kconfig fix] Signed-off-by: NShailabh Nagar <nagar@us.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Chandra Seetharaman 提交于
Make the task-related schedstats functions callable by delay accounting even if schedstats collection isn't turned on. This removes the dependency of delay accounting on schedstats. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Unlike earlier iterations of the delay accounting patches, now delays are only collected for the actual I/O waits rather than try and cover the delays seen in I/O submission paths. Account separately for block I/O delays incurred as a result of swapin page faults whose frequency can be affected by the task/process' rss limit. Hence swapin delays can act as feedback for rss limit changes independent of I/O priority changes. Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shailabh Nagar 提交于
Initialization code related to collection of per-task "delay" statistics which measure how long it had to wait for cpu, sync block io, swapping etc. The collection of statistics and the interface are in other patches. This patch sets up the data structures and allows the statistics collection to be disabled through a kernel boot parameter. Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com> Signed-off-by: NBalbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
On platforms that have __ARCH_WANT_UNLOCKED_CTXSW set and want to implement lock validator support there's a bug in rq->lock handling: in this case we dont 'carry over' the runqueue lock into another task - but still we did a spinlock_release() of it. Fix this by making the spinlock_release() in context_switch() dependent on !__ARCH_WANT_UNLOCKED_CTXSW. (Reported by Ralf Baechle on MIPS, which has __ARCH_WANT_UNLOCKED_CTXSW. This fixes a lockdep-internal BUG message on such platforms.) Signed-off-by: NIngo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 OGAWA Hirofumi 提交于
IRQs must be disabled before taking ->siglock. Noticed by lockdep. Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-