- 17 10月, 2007 5 次提交
-
-
由 Nick Piggin 提交于
Quite a bit of code is used in maintaining these "cached pages" that are probably pretty unlikely to get used. It would require a narrow race where the page is inserted concurrently while this process is allocating a page in order to create the spare page. Then a multi-page write into an uncached part of the file, to make use of it. Next, the buffered write path (and others) uses its own LRU pagevec when it should be just using the per-CPU LRU pagevec (which will cut down on both data and code size cacheline footprint). Also, these private LRU pagevecs are emptied after just a very short time, in contrast with the per-CPU pagevecs that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required to add the pages to pagecache for a bulk write (in 4K chunks). [this gets rid of some cond_resched() calls in readahead.c and mpage.c due to clashes in -mm. What put them there, and why? ] Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
nobh mode error handling is not just pretty slack, it's wrong. One cannot zero out the whole page to ensure new blocks are zeroed, because it just brings the whole page "uptodate" with zeroes even if that may not be the correct uptodate data. Also, other parts of the page may already contain dirty data which would get lost by zeroing it out. Thirdly, the writeback of zeroes to the new blocks will also erase existing blocks. All these conditions are pagecache and/or filesystem corruption. The problem comes about because we didn't keep track of which buffers actually are new or old. However it is not enough just to keep only this state, because at the point we start dirtying parts of the page (new blocks, with zeroes), the handling of IO errors becomes impossible without buffers because the page may only be partially uptodate, in which case the page flags allone cannot capture the state of the parts of the page. So allocate all buffers for the page upfront, but leave them unattached so that they don't pick up any other references and can be freed when we're done. If the error path is hit, then zero the new buffers as the regular buffer path does, then attach the buffers to the page so that it can actually be written out correctly and be subject to the normal IO error handling paths. As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page systems. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Monakhov 提交于
Move duplicated code from end_buffer_read_XXX methods to separate helper function. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
The commit b5810039 contains the note A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!). There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see. Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used). As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it. When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss. Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it. The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus "snif" Torvalds <torvalds@linux-foundation.org>
-
由 Fengguang Wu 提交于
Combine the file_ra_state members unsigned long prev_index unsigned int prev_offset into loff_t prev_pos It is more consistent and better supports huge files. Thanks to Peter for the nice proposal! [akpm@linux-foundation.org: fix shift overflow] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 10月, 2007 1 次提交
-
-
由 Randy Dunlap 提交于
Fix filesystems docbook warnings. Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value' Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 10月, 2007 7 次提交
-
-
由 Ingo Molnar 提交于
make sync wakeups affine for cache-cold tasks: if a cache-cold task is woken up by a sync wakeup then use the opportunity to migrate it straight away. (the two tasks are 'related' because they communicate) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Laurent Vivier 提交于
like for cpustat, introduce the "gtime" (guest time of the task) and "cgtime" (guest time of the task children) fields for the tasks. Modify signal_struct and task_struct. Modify /proc/<pid>/stat to display these new fields. Signed-off-by: NLaurent Vivier <Laurent.Vivier@bull.net> Acked-by: NAvi Kivity <avi@qumranet.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Laurent Vivier 提交于
as recent CPUs introduce a third running state, after "user" and "system", we need a new field, "guest", in cpustat to store the time used by the CPU to run virtual CPU. Modify /proc/stat to display this new field. Signed-off-by: NLaurent Vivier <Laurent.Vivier@bull.net> Acked-by: NAvi Kivity <avi@qumranet.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Here's another piece of low hanging obsolete fruit. Remove obsolete TASK_NONINTERACTIVE. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
rename all 'cnt' fields and variables to the less yucky 'count' name. yuckage noticed by Andrew Morton. no change in code, other than the /proc/sched_debug bkl_count string got a bit larger: text data bss dec hex filename 38236 3506 24 41766 a326 sched.o.before 38240 3506 24 41770 a32a sched.o.after Signed-off-by: NIngo Molnar <mingo@elte.hu> Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 10月, 2007 1 次提交
-
-
由 Peter Zijlstra 提交于
On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote: > The circular lock seems to be this: > > #1: > > sys_mmap2: down_write(&mm->mmap_sem); > nfs_revalidate_mapping: mutex_lock(&inode->i_mutex); > > > #0: > > vfs_readdir: mutex_lock(&inode->i_mutex); > - during the readdir (filldir64), we take a user fault (missing page?) > and call do_page_fault - > do_page_fault: down_read(&mm->mmap_sem); > > > So it does indeed look like a circular locking. Now the question is, "is > this a bug?". Looking like the inode of #1 must be a file or something > else that you can mmap and the inode of #0 seems it must be a directory. > I would say "no". > > Now if you can readdir on a file or mmap a directory, then this could be > an issue. > > Otherwise, I'd love to see someone teach lockdep about this issue! ;-) Make a distinction between file and dir usage of i_mutex. The inode should be complete and unused at unlock_new_inode(), re-init i_mutex depending on its type. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
- 15 10月, 2007 1 次提交
-
-
由 Peter Zijlstra 提交于
Give each filesystem its own inode lock class. The various filesystems have different locking order wrt the inode locks; esp. the pseudo filesystems differ from the rest. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
- 14 10月, 2007 1 次提交
-
-
由 Dave Kleikamp 提交于
commit e30408b2 ("JFS: fix bio-related build breakage") removed some "return 0;" statements, rather than changing them to null returns. Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 10月, 2007 24 次提交
-
-
由 David Woodhouse 提交于
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
In three places: summary scan, normal scan, REF_PRISTINE GC. Just truncate at the NUL, since that was the correct thing to do in the only case where this (inexplicable) breakage has been seen. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
I have no idea how this happened, but OLPC trac #4184 suggests that it did. Catch it early. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
... where we'll actually print the count in a debug message. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
In OLPC trac #4184 we found a case where a corrupted node didn't actually get obsoleted when we tried to garbage-collect it. So we wrote out many million copies of it, in repeated attempts to obsolete it, until the flash became full. Don't Do That. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
Instead of matching resv_blocks_gcmerge, which is only about 3, instead match resv_blocks_gctrigger, which includes a proportion of the total device size. These ought to become tunable from userspace, at some point. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Tejun Heo 提交于
Sysfs has gone through considerable amount of reimplementation. Add copyrights. Any objections? :-) Signed-off-by: NTejun Heo <htejun@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
Sysfs file poll implementation is scattered over sysfs and kobject. Event numbering is done in sysfs_dirent but wait itself is done on kobject. This not only unecessarily bloats both kobject and sysfs_dirent but is also buggy - if a sysfs_dirent is removed while there still are pollers, the associaton betwen the kobject and sysfs_dirent breaks and kobject may be freed with the pollers still sleeping on it. This patch moves whole poll implementation into sysfs_open_dirent. Each time a sysfs_open_dirent is created, event number restarts from 1 and pollers sleep on sysfs_open_dirent. As event sequence number is meaningless without any open file and pollers should have open file and thus sysfs_open_dirent, this ephemeral event counting works and is a saner implementation. This patch fixes the dnagling sleepers bug and reduces the sizes of kobject and sysfs_dirent by one pointer. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
Implement sysfs_open_dirent which represents an open file (attribute) sysfs_dirent. A file sysfs_dirent with one or more open files have one sysfs_dirent and all sysfs_buffers (one for each open instance) are linked to it. sysfs_open_dirent doesn't actually do anything yet but will be used to off-load things which are specific for open file sysfs_dirent from it. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
Children list head is only meaninful for directory nodes. Move it into s_dir. This doesn't save any space currently but it will with further changes. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
sysfs_root is different from a regular directory dirent in that it's of type SYSFS_ROOT and doesn't have a name. These differences aren't used by anybody and only adds to complexity. Make sysfs_root a regular directory dirent. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
sysfs_attach_dentry() now has only one caller and isn't doing much other than obfuscating the code. Open code and kill it. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
Make s_elem an anonymous union. Prefixing with s_elem makes things needlessly longer without any advantage. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
All bin attr operations require active references of itself and its parent. There's no reason to allow open when its parent has been deactivated and allowing it is inconsistent with regular sysfs file. Use sysfs_get_active_two() in bin attribute open function. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
In sysfs_release(), sysfs_buffer pointed to by filp->private_data is guaranteed to exist. Kill the unnecessary NULL check. This also makes the code more consistent with the counterpart in fs/sysfs/bin.c. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
There's no reason to get an extra reference to sysfs_dirent for an open file. Open file has a reference to the dentry which in turn has a reference to sysfs_dirent. This is fairly obvious as otherwise open itself won't be able to access the sysfs_dirent. Kill the extra sysfs_get() and matching sysfs_put(). Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
Move s_mode downward such that it's side-by-side with s_iattr which is used for the same thing. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
sysfs_update_file() depends on inode->i_mtime but sysfs iondes are now reclaimable making the reported modification time unreliable. There's only one user (pci hotplug) of this notification mechanism and it reportedly isn't utilized from userland. Kill sysfs_update_file(). Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
sysfs is about to go through major overhaul making this a pretty good opportunity to clean up (out-of-tree changes and pending patches will need regeneration anyway). Clean up headers. * Kill space between * and symbolname. * Move SYSFS_* type constants and flags into fs/sysfs/sysfs.h. They're internal to sysfs. * Reformat function prototypes and add argument symbol names. * Make dummy function definition order match that of function prototypes. * Add some comments. * Reorganize fs/sysfs/sysfs.h according to which file the declared variable or feature lives in. This patch does not introduce any behavior change. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
sysfs_chmod_file() looked and updated only inode of the target file. Dentry and inode are reclaimable and the update mode data will go away when the inode is reclaimed. This patch makes sysfs_chmod_file() update sd->s_mode too such that the change is permanent. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tejun Heo 提交于
sysfs_add/remove_one() now link and unlink the target dirent into and from the children list. Update comments accordingly. Signed-off-by: NTejun Heo <htejun@gmail.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
We want to let people know when we create a duplicate sysfs file, as they need to fix up their code. Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Eric W. Biederman 提交于
This patch rewrites sysfs_move_dir to perform it's checks as much as possible on the underlying sysfs_dirents instead of the contents of the dcache, making sysfs_move_dir more like the rest of the sysfs directory modification code. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NTejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Eric W. Biederman 提交于
This patch rewrites sysfs_rename_dir to perform it's checks as much as possible on the underlying sysfs_dirents instead of the contents of the dcache. It turns out that this version is a little simpler, and a little more like the rest of the sysfs directory modification code. tj: fixed double locking of sysfs_mutex Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NTejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-