- 01 4月, 2006 40 次提交
-
-
由 Andrew Morton 提交于
This just got nuked in mainline. Bring it back because Eric's patches use it. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Eric W. Biederman 提交于
The core problem: setsid fails if it is called by init. The effect in 2.6.16 and the earlier kernels that have this problem is that if you do a "ps -j 1 or ps -ej 1" you will see that init and several of it's children have process group and session == 0. Instead of process group == session == 1. Despite init calling setsid. The reason it fails is that daemonize calls set_special_pids(1,1) on kernel threads that are launched before /sbin/init is called. The only remaining effect in that current->signal->leader == 0 for init instead of 1. And the setsid call fails. No one has noticed because /sbin/init does not check the return value of setsid. In 2.4 where we don't have the pidhash table, and daemonize doesn't exist setsid actually works for init. I care a lot about pid == 1 not being a special case that we leave broken, because of the container/jail work that I am doing. - Carefully allow init (pid == 1) to call setsid despite the kernel using its session. - Use find_task_by_pid instead of find_pid because find_pid taking a pidtype is going away. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Thomas Gleixner 提交于
The futex timeval is not checked for correctness. The change does not break existing applications as the timeval is supplied by glibc (and glibc always passes a correct value), but the glibc-internal tests for this functionality fail. Signed-off-by: NThomas Gleixner <tglx@tglx.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
To increase the strength of SCHED_BATCH as a scheduling hint we can activate batch tasks on the expired array since by definition they are latency insensitive tasks. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
On runqueue time is used to elevate priority in schedule(). In the code it currently requeues tasks even if their priority is not elevated, which would end up placing them at the end of their runqueue array effectively delaying them instead of improving their priority. Bug spotted by Mike Galbraith <efault@gmx.de> This patch removes this requeueing. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
Tasks waiting in SLEEP_NONINTERACTIVE state can now get to best priority so they need to be included in the idle detection code. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
We watch for tasks that sleep extended periods and don't allow one single prolonged sleep period from elevating priority to maximum bonus to prevent cpu bound tasks from getting high priority with single long sleeps. There is a bug in the current code that also penalises tasks that already have high priority. Correct that bug. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
Alterations to the pipe code in the kernel made it possible for relative starvation to occur with tasks that slept waiting on a pipe getting unfair priority bonuses even if they were otherwise fully cpu bound so the TASK_NONINTERACTIVE flag was introduced which prevented any change to sleep_avg while sleeping waiting on a pipe. This change also leads to the converse though, preventing any priority boost from occurring in truly interactive tasks that wait on pipes. Convert the TASK_NONINTERACTIVE flag to set sleep_type to SLEEP_NONINTERACTIVE which will allow a linear bonus to priority based on sleep time thus allowing interactive tasks to get high priority if they sleep enough. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
The activated flag in task_struct is used to track different sleep types and its usage is somewhat obfuscated. Convert the variable to an enum with more descriptive names without altering the function. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jack Steiner 提交于
Currently, count_active_tasks() calls both nr_running() & nr_interruptible(). Each of these functions does a "for_each_cpu" & reads values from the runqueue of each cpu. Although this is not a lot of instructions, each runqueue may be located on different node. Depending on the architecture, a unique TLB entry may be required to access each runqueue. Since there may be more runqueues than cpu TLB entries, a scan of all runqueues can trash the TLB. Each memory reference incurs a TLB miss & refill. In addition, the runqueue cacheline that contains nr_running & nr_uninterruptible may be evicted from the cache between the two passes. This causes unnecessary cache misses. Combining nr_running() & nr_interruptible() into a single function substantially reduces the TLB & cache misses on large systems. This should have no measureable effect on smaller systems. On a 128p IA64 system running a memory stress workload, the new function reduced the overhead of calc_load() from 605 usec/call to 324 usec/call. Signed-off-by: NJack Steiner <steiner@sgi.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dimitri Sivanich 提交于
It seems that run_hrtimer_queue() is calling get_softirq_time() more often than it needs to. With this patch, it only calls get_softirq_time() if there's a pending timer. Signed-off-by: NDimitri Sivanich <sivanich@sgi.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Thomas Gleixner 提交于
Replace the nanosleep private sleeper functionality by the generic hrtimer sleeper. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Thomas Gleixner 提交于
The removal of the data field in the hrtimer structure enforces the embedding of the timer into another data structure. nanosleep now uses a private implementation of the most common used timer callback function (simple task wakeup). In order to avoid the reimplentation of such functionality all over the place a generic hrtimer_sleeper functionality is created. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Add an LED trigger for IDE disk activity to the ide-disk driver. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Acked-by: NBartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Ensure ide-taskfile.c calls any driver specific end_request function if present. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: NBartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dirk Opfer 提交于
Adds LED drivers for LEDs found on the Sharp Zaurus c6000 model (tosa). Signed-off-by: NDirk Opfer <dirk@opfer-online.de> Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 John Bowler 提交于
NEW_LEDS support for ixp4xx boards where LEDs are connected to the GPIO lines. This includes a new generic ixp4xx driver (leds-ixp4xx-gpio.c name "IXP4XX-GPIO-LED") Signed-off-by: NJohn Bowler <jbowler@acm.org> Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Adds an LED driver for LEDs exported by the Sharp LOCOMO chip as found on some models of Sharp Zaurus. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Adds LED drivers for LEDs found on the Sharp Zaurus c7x0 (corgi, shepherd, husky) and cxx00 (akita, spitz, borzoi) models. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Add an LED trigger for the charger status as found on the Sharp Zaurus series of devices. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Acked-by: NPavel Machek <pavel@suse.cz> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Add an example of a complex LED trigger in the form of a generic timer which triggers the LED its attached to at a user specified frequency and duty cycle. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Add support for LED triggers to the LED subsystem. "Triggers" are events which change the state of an LED. Two kinds of trigger are available, simple ones which can be added to exising code with minimum disruption and complex ones for implementing new or more complex functionality. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
Add the foundations of a new LEDs subsystem. This patch adds a class which presents LED devices within sysfs and allows their brightness to be controlled. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Richard Purdie 提交于
The LED class/subsystem takes John Lenz's work and extends and alters it to give what I think should be a fairly universal LED implementation. The series consists of several logical units: * LED Core + Class implementation * LED Trigger Core implementation * LED timer trigger (example of a complex trigger) * LED device drivers for corgi, spitz and tosa Zaurus models * LED device driver for locomo LEDs * LED device driver for ARM ixp4xx LEDs * Zaurus charging LED trigger * IDE disk activity LED trigger * NAND MTD activity LED trigger Why? ==== LEDs are really simple devices usually amounting to a GPIO that can be turned on and off so why do we need all this code? On handheld or embedded devices they're an important part of an often limited user interface. Both users and developers want to be able to control and configure what the LED does and the number of different things they'd potentially want the LED to show is large. A subsystem is needed to try and provide all this different functionality in an architecture independent, simple but complete, generic and scalable manner. The alternative is for everyone to implement just what they need hidden away in different corners of the kernel source tree and to provide an inconsistent interface to userspace. Other Implementations ===================== I'm aware of the existing arm led implementation. Currently the new subsystem and the arm code can coexist quite happily. Its up to the arm community to decide whether this new interface is acceptable to them. As far as I can see, the new interface can do everything the existing arm implementation can with the advantage that the new code is architecture independent and much more generic, configurable and scalable. I'm prepared to make the conversion to the LED subsystem (or assist with it) if appropriate. Implementation Details ====================== I've stripped a lot of code out of John's original LED class. Colours were removed as LED colour is now part of the device name. Multiple colours are to be handled as multiple led devices. This means you get full control over each colour. I also removed the LED hardware timer code as the generic timer isn't going to add much overhead and is just as useful. I also decided to have the LED core track the current LED status (to ease suspend/resume handling) removing the need for brightness_get implementations in the LED drivers. An underlying design philosophy is simplicity. The aim is to keep a small amount of code giving as much functionality as possible. The major new idea is the led "trigger". A trigger is a source of led events. Triggers can either be simple or complex. A simple trigger isn't configurable and is designed to slot into existing subsystems with minimal additional code. Examples are the ide-disk, nand-disk and zaurus-charging triggers. With leds disabled, the code optimises away. Examples are nand-disk and ide-disk. Complex triggers whilst available to all LEDs have LED specific parameters and work on a per LED basis. The timer trigger is an example. You can change triggers in a similar manner to the way an IO scheduler is chosen (via /sys/class/leds/somedevice/trigger). So far there are only a handful of examples but it should easy to add further LED triggers without too much interference into other subsystems. Known Issues ============ The LED Trigger core cannot be a module as the simple trigger functions would cause nightmare dependency issues. I see this as a minor issue compared to the benefits the simple trigger functionality brings. The rest of the LED subsystem can be modular. Some leds can be programmed to flash in hardware. As this isn't a generic LED device property, I think this should be exported as a device specific sysfs attribute rather than part of the class if this functionality is required (eg. to keep the led flashing whilst the device is suspended). Future Development ================== At the moment, a trigger can't be created specifically for a single LED. There are a number of cases where a trigger might only be mappable to a particular LED. The addition of triggers provided by the LED driver should cover this option and be possible to add without breaking the current interface. A CPU activity trigger similar to that found in the arm led implementation should be trivial to add. This patch: Add some brief documentation of the design decisions behind the LED class and how it appears to users. Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
One of the LEDs driver files wants to use this. Probably drivers/mtd/maps/ipaq-flash.c wants to convert as well - right now it'll be tainting the kernel. Cc: David Woodhouse <dwmw2@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Bowler <jbowler@acm.org> Cc: "'Richard Purdie'" <rpurdie@rpsys.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rafael J. Wysocki 提交于
Add TIOCL_GETKMSGREDIRECT needed by the userland suspend tool to get the current value of kmsg_redirect from the kernel so that it can save it and restore it after resume. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NPavel Machek <pavel@suse.cz> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jesper Juhl 提交于
Fix a few memory leaks in drivers/isdn/sc/ioctl.c::sc_ioctl() Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com> Acked-by: NKarsten Keil <kkeil@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Tobias Klauser 提交于
Delete two useless kmalloc wrappers and use kmalloc/kzalloc. Some weird NULL checks are also simplified. Signed-off-by: NTobias Klauser <tklauser@nuerscht.ch> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Trond Myklebust 提交于
sys_flock() currently has a race which can result in a double free in the multi-thread case. Thread 1 Thread 2 sys_flock(file, LOCK_EX) sys_flock(file, LOCK_UN) If Thread 2 removes the lock from inode->i_lock before Thread 1 tests for list_empty(&lock->fl_link) at the end of sys_flock, then both threads will end up calling locks_free_lock for the same lock. Fix is to make flock_lock_file() do the same as posix_lock_file(), namely to make a copy of the request, so that the caller can always free the lock. This also has the side-effect of fixing up a reference problem in the lockd handling of flock. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Amy Griffis 提交于
IN_DELETE events are no longer generated for the removal of a file from a watched directory. This seems to be a result of clearing DCACHE_INOTIFY_PARENT_WATCHED in d_delete() directly before calling fsnotify_nameremove(). Assuming the flag doesn't need to be cleared before dentry_iput(), this should do the trick. Signed-off-by: NAmy Griffis <amy.griffis@hp.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Acked-by: NRobert Love <rml@novell.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 OGAWA Hirofumi 提交于
Since these names on old MSDOS is used as device, so, current fat driver doesn't allow a user to create those names. But many OSes and even Windows can create those names actually, now. This patch removes the reserved name check. Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Fix memory migration so that it works regardless of what cpuset the invoking task is in. If a task invoked a memory migration, by doing one of: 1) writing a different nodemask to a cpuset 'mems' file, or 2) writing a tasks pid to a different cpuset's 'tasks' file, where the cpuset had its 'memory_migrate' option turned on, then the allocation of the new pages for the migrated task(s) was constrained by the invoking tasks cpuset. If this task wasn't in a cpuset that allowed the requested memory nodes, the memory migration would happen to some other nodes that were in that invoking tasks cpuset. This was usually surprising and puzzling behaviour: Why didn't the pages move? Why did the pages move -there-? To fix this, temporarilly change the invoking tasks 'mems_allowed' task_struct field to the nodes the migrating tasks is moving to, so that new pages can be allocated there. Signed-off-by: NPaul Jackson <pj@sgi.com> Acked-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Fix unsafe reference to a tasks mm struct, by moving the reference inside of a convenient nearby properly guarded code block. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Fix cpuset comment involving case of a tasks cpuset pointer being NULL. Thanks to "the_top_cpuset_hack", this code no longer sees NULL task->cpuset pointers. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
local_t's were defined to be unsigned. This increases confusion because atomic_t's are signed. The patch goes through and changes all implementations to use signed longs throughout. Also, x86-64 was using 32-bit quantities for the value passed into local_add() and local_sub(). Fixed. All (actually, both) existing users have been audited. (Also s/__inline__/inline/ in x86_64/local.h) Cc: Andi Kleen <ak@muc.de> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Kyle McMartin <kyle@parisc-linux.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Steffen Klassert 提交于
The "3c59x: use mii_check_media" patch introduced a netif_carrier_off in vortex_up. 10base2 stoped working because of this. This is removed. Tx/Rx reset is back in vortex_up because the 3c900B-Combo stops working after changing from half duplex to full duplex when Tx/Rx reset is done with vortex_timer. Also brought back some mii stuff to be sure that it does not break something else. Thanks to Pete Clements <clem@clem.clem-digital.net> for reporting and testing. Signed-off-by: NSteffen Klassert <klassert@mathematik.tu-chemnitz.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
The pre-2.6.16 patch "3c59x collision statistics fix" accidentally caused vortex_error() to not run iowrite16(TxEnable, ioaddr + EL3_CMD) if we got a maxCollisions interrupt but MAX_COLLISION_RESET is not set. Thanks to Pete Clements <clem@clem.clem-digital.net> for reporting and testing. Acked-by: NSteffen Klassert <klassert@mathematik.tu-chemnitz.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Trond Myklebust 提交于
The help text says that if you select CONFIG_LBD, then it will automatically select CONFIG_LFS. That isn't currently the case, so update the text. - Get rid of the cruft in the help text mentioning CONFIG_LBD - Tell unsure users to select CONFIG_LFS. - Remove the `default n'. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 KaiGai Kohei 提交于
I noticed a bug on the process accounting facility. In multi-threading process, some data would be recorded incorrectly when the group_leader dies earlier than one or more threads. The attached patch fixes this problem. See below. 'bugacct' is a test program that create a worker thread after 4 seconds sleeping, then the group_leader dies soon. The worker thread consume CPU/Memory for 6 seconds, then exit. We can estimate 10 seconds as etime and 6 seconds as stime + utime. This is a sample program which the group_leader dies earlier than other threads. The results of same binary execution on different kernel are below. -- accounted records -------------------- | btime | utime | stime | etime | minflt | majflt | comm | original | 13:16:40 | 0.00 | 0.00 | 6.10 | 171 | 0 | bugacct | patched | 13:20:21 | 5.83 | 0.18 | 10.03 | 32776 | 0 | bugacct | (*) bugacct allocates 128MB memory, thus 128MB / 4KB = 32768 of minflt is appropriate. -- Test results in original kernel ------ $ date; time -p ./bugacct Tue Mar 28 13:16:36 JST 2006 <- But pacct said btime is 13:16:40 real 10.11 <- But pacct said etime is 6.10 user 5.96 <- But pacct said utime is 0.00 sys 0.14 <- But pacct said stime is 0.00 $ -- Test results in patched kernel ------- $ date; time -p ./bugacct Tue Mar 28 13:20:21 JST 2006 real 10.04 user 5.83 sys 0.19 $ In the original 2.6.16 kernel, pacct records btime, utime, stime, etime and minflt incorrectly. In my opinion, this problem is caused by an assumption that group_leader dies last. The following section calculates process running time for etime and btime. But it means running time of the thread that dies last, not process. The start_time of the first thread in the process (group_leader) should be reduced from uptime to calculate etime and btime correctly. ---- do_acct_process() in kernel/acct.c: /* calculate run_time in nsec*/ do_posix_clock_monotonic_gettime(&uptime); run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec; run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC + current->start_time.tv_nsec; ---- The following section calculates stime and utime of the process. But it might count the utime and stime of the group_leader duplicatly and ignore the utime and stime of the thread dies last, when one or more threads remain after group_leader dead. The ac_utime should be calculated as the sum of the signal->utime and utime of the thread dies last. The ac_stime should be done also. ---- do_acct_process() in kernel/acct.c: jiffies = cputime_to_jiffies(cputime_add(current->group_leader->utime, current->signal->utime)); ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies)); jiffies = cputime_to_jiffies(cputime_add(current->group_leader->stime, current->signal->stime)); ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies)); ---- The part of the minflt/majflt calculation has same problem. This patch solves those problems, I think. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-